KR20180058159A

KR20180058159A - Method and apparatus for predicting probability of the outbreak of a disease

Info

Publication number: KR20180058159A
Application number: KR1020160176525A
Authority: KR
Inventors: 채명훈; 최상훈; 박서진; 이관홍
Original assignee: 주식회사 셀바스에이아이
Priority date: 2016-11-23
Filing date: 2016-12-22
Publication date: 2018-05-31
Also published as: KR101885111B1

Abstract

The present invention relates to a method and an apparatus for predicting occurrence of a disease. According to an embodiment of the present, the method for predicting occurrence of a disease comprises the steps of: receiving original data including a plurality of items from at least one external database; generating processed data representing one medical treatment or one medical examination as one event in accordance with a predetermined criterion on the basis of the original data; inputting the processed data into a disease occurrence prediction model; and calculating a disease occurrence probability for at least one disease using the disease occurrence prediction model. According to the present invention, by representing health-related data having different forms as one event, it is possible to input various data to a disease occurrence prediction model.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method for predicting the onset of disease,

본 발명은 질환 발병 예측 방법 및 장치에 관한 것으로서, 보다 상세하게는 수신한 건강 관련 데이터와 질환 발병 예측 모델을 이용하여 질환 발병 확률을 산출하는 질환 발병 예측 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for predicting disease outbreaks, and more particularly, to a method and apparatus for predicting disease outbreaks in which probability of disease outbreak is calculated using received health-related data and a disease outbreak prediction model.

최근 신체에 해로운 인스턴트 음식 또는 패스트 푸드의 섭취 증가, 활동량 부족, 과도한 업무 등으로 인한 질환 발병 확률이 크게 증가하고 있다. 특히, 고혈압, 허혈성 심장 질환, 관상 동맥 질환, 동맥 경화증 등의 심혈관 질환에 대한 발병이 급증하고 있다. Recently, the incidence of diseases caused by increased consumption of instant food or fast food which is harmful to the body, lack of activity, and excessive work are greatly increasing. Particularly, the onset of cardiovascular diseases such as hypertension, ischemic heart disease, coronary artery disease and arteriosclerosis is increasing rapidly.

이에 따라, 심혈관 질환을 예방하고, 관리하기 위해 질환 위험도 평가를 사용한다. 질환 위험도 평가에는 다양한 임상적 의사 결정 도구가 활용되고 있다. 예를 들어, Framingham risk score가 사용되는데 Framingham risk score란, 여러가지 심혈관 질환의 위험인자인 성별, 나이, 수축기 혈압, 흡연, 당뇨병, 총 콜레스테롤, HDL 콜레스테롤 등을 통해 심혈관 질환 발생 위험도를 평가하는 지표이다. 하지만, 심혈관 질환 병력을 가진 환자는 재발 위험이 높기 때문에 과거력을 고려하지 않는 Farmingham risk score은 질환 위험도를 측정하기에는 한계가 있다. 또한, Farmingham risk score은 외국에서 개발된 방법이기 때문에 국내의 평균 질환 발병률과 위험 요인 노출 수준에 따라 한국인에 맞게 보정할 필요성이 존재한다. 현재 한국인에 맞게 보정된 위험도 평가 도구가 존재하지만 고위험군 선정에 대한 기준의 근거가 부족하고, 고위험군 선별에 큰 역할을 하지 못하고 있어 임상적으로 널리 사용되지 않고 있다.Accordingly, disease risk assessment is used to prevent and manage cardiovascular disease. A variety of clinical decision-making tools are used to assess disease risk. For example, the Framingham risk score is used. The Framingham risk score is an index for assessing the risk of cardiovascular disease through various cardiovascular risk factors such as sex, age, systolic blood pressure, smoking, diabetes, total cholesterol and HDL cholesterol . However, because of the high risk of recurrence in patients with a history of cardiovascular disease, the Farmingham risk score, which does not take into account the past history, is limited in measuring disease risk. Since the Farmingham risk score is developed in a foreign country, there is a need to correct for the Korean population according to the average disease incidence rate and risk exposure level in Korea. There is currently a risk - adjusted risk assessment tool for Koreans, but the criteria for selection of high - risk groups are lacking and they are not widely used clinically because they do not play a major role in high - risk screening.

[관련기술문헌][Related Technical Literature]

치주질환 예측 시스템 및 이를 이용한 치주질환 예측 방법 (공개특허 10-2016-0083502호)Periodontal disease prediction system and method for predicting periodontal disease using the same (Japanese Patent Laid-Open No. 10-2016-0083502)

현재 의료업계에서는 질환 발병을 예측하기 위하여 하나의 요소만을 사용하거나, 복수의 요소들을 기초로 통계학적으로만 활용하고 있고, 복수의 요소들을 필터링하여 필수적인 요소를 추출하는 데는 한계가 있다. 따라서, 한국인의 의료 데이터를 활용하여, 의료 데이터에 포함된 복수의 요소들을 기초로 머신 러닝을 통해 추출된 요소를 다차원 형태로 고려하게 된다면 훨씬 높은 정확도를 가질 수 있으며, 더 나아가, 한국인에게 적합한 질환 발병 예측 모델을 구현할 수 있다.Currently, in the medical industry, only one element is used to predict the onset of a disease, or only statistically based on a plurality of elements, and there is a limit to extract essential elements by filtering a plurality of elements. Therefore, by using the medical data of Koreans, it is possible to have a much higher accuracy if the elements extracted through machine learning are considered in a multi-dimensional form based on a plurality of elements included in the medical data, and furthermore, The onset prediction model can be implemented.

본 발명이 해결하고자 하는 과제는 각각 다른 형태를 가지는 건강 관련 데이터를 하나의 이벤트로 나타냄으로써, 질환 발병 예측 모델에 다양한 데이터를 입력할 수 있는 질환 발병 예측 방법 및 장치를 제공하는 것이다.Disclosure of Invention Technical Problem [8] The present invention provides a method and apparatus for predicting a disease outbreak, which can input various data into a disease onset prediction model by representing health related data having different forms as one event.

본 발명이 해결하고자 하는 다른 과제는 수신한 건강 관련 데이터를 다양하게 가공하여 질환 발병 예측 모델에 입력함으로써, 질환 발병 확률의 정확성을 높일 수 있는 질환 발병 예측 방법 및 장치를 제공하는 것이다.Another problem to be solved by the present invention is to provide a method and apparatus for predicting disease outbreak that can improve the accuracy of the disease outbreak probability by inputting various received health related data into a disease predicting model.

발명의 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. The problems of the present invention are not limited to the above-mentioned problems, and other problems not mentioned can be clearly understood by those skilled in the art from the following description.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 일 실시예에 따른 질환 발병 예측 방법은 적어도 하나의 외부 데이터베이스로부터 복수의 항목을 포함하는 원본 데이터를 수신하는 단계, 원본 데이터를 기초로 미리 결정된 기준에 따라 1회의 진료 또는 1회의 건강 검진을 하나의 이벤트로 나타내는 가공 데이터를 생성하는 단계, 가공 데이터를 질환 발병 예측 모델에 입력하는 단계 및 질환 발병 예측 모델을 이용하여 적어도 하나의 질환에 대한 질환 발병 확률을 산출하는 단계를 포함한다.According to an aspect of the present invention, there is provided a disease predicting method comprising: receiving original data including a plurality of items from at least one external database; A step of generating machining data representing one medical examination or one medical examination as one event, inputting the machining data into the disease occurrence prediction model, and calculating the probability of occurrence of the disease for at least one disease .

본 발명의 다른 특징에 따르면, 질환은 심혈관 질환, 위암, 간암, 대장암, 폐암, 유방암, 전립선암, 치매 또는 당뇨 중 적어도 하나이고, 질환 발병 예측 모델은 질환 각각에 대하여 별도로 구축될 수 있다.본 발명의 또 다른 특징에 따르면, 원본 데이터를 수신하는 단계는, 사회학적 데이터, 적어도 1회의 진료를 포함하는 진료 기록 데이터 및 적어도 1회의 건강 검진을 포함하는 건강 검진 데이터 중 하나 이상을 수신하는 단계일 수 있다.According to another feature of the invention, the disease is at least one of cardiovascular disease, gastric cancer, liver cancer, colon cancer, lung cancer, breast cancer, prostate cancer, dementia or diabetes, and the disease onset prediction model can be separately constructed for each disease. According to another aspect of the present invention, the step of receiving the original data includes receiving at least one of sociological data, medical record data including at least one medical examination, and health examination data including at least one medical examination Lt; / RTI >

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 하나의 진료 일자에 대해 복수의 원본 데이터가 존재하는 경우, 원본 데이터를 하나의 진료 일자에 대한 하나의 이벤트로 통합하는 단계를 더 포함할 수 있다.According to still another aspect of the present invention, the step of generating processing data includes a step of, when a plurality of original data exists for one medical care day, integrating the original data into one event for one medical care day .

본 발명의 또 다른 특징에 따르면, 하나의 이벤트는 복용 약품 분류 코드 및 복용 투약량에 대한 데이터를 포함할 수 있다.According to another aspect of the present invention, an event may include data on an administration drug classification code and a dosage amount.

본 발명의 또 다른 특징에 따르면, 질환 발병 예측 방법은 복수의 항목 중에서 질환 발병과 연관된 항목을 필터링하는 단계를 더 포함할 수 있다.According to another aspect of the present invention, the disease incidence prediction method may further include filtering an item related to the disease outbreak among a plurality of items.

본 발명의 또 다른 특징에 따르면, 질환 발병과 연관된 항목은 적어도 50개일 수 있다.According to another feature of the invention, there may be at least 50 items associated with the disease outbreak.

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 이벤트 중 결측된 이벤트가 존재하는지 판단하는 단계, 결측된 이벤트가 존재하는 경우, 결측된 이벤트에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 생성하는 단계 및 대표값, 평균값 또는 보간값 중 적어도 하나를 결측된 이벤트에 입력하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method for generating processed data, comprising the steps of: determining whether a missing event in an event exists; if a missing event is present, Generating at least one, and inputting at least one of a representative value, an average value, or an interpolated value to the missing event.

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 이벤트에 포함된 복수의 항목에 결측된 데이터가 존재하는지 판단하는 단계, 결측된 데이터가 존재하는 경우, 결측된 데이터에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 생성하는 단계 및 대표값, 평균값 또는 보간값 중 적어도 하나를 결측된 데이터에 입력하는 단계를 포함할 수 있다.According to another aspect of the present invention, the step of generating processed data includes the steps of: determining whether there is data missing in a plurality of items included in the event; if there is missing data, , Generating an average value or an interpolated value, and inputting at least one of the representative value, the average value, or the interpolated value to the missing data.

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 이벤트에 대한 길이의 빈도를 기초로 분포를 산출하는 단계 및 분포에서 미리 결정된 역치값에 해당하는 이벤트만을 포함하도록 가공 데이터를 생성하는 단계를 포함하고, 역치값은, 분포의 중심을 기준으로 좌측으로부터 우측까지 95% 영역에 위치한 이벤트에 대한 길이일 수 있다.According to another aspect of the present invention, the step of generating the processing data includes the step of calculating the distribution based on the frequency of the length of the event, and the step of generating the processing data so as to include only the event corresponding to the predetermined threshold value in the distribution And the threshold value may be a length for an event located in the 95% region from left to right with respect to the center of the distribution.

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 이벤트에 포함된 복수의 항목의 데이터에 대한 평균 및 표준편차를 계산하는 단계, 평균 및 표준편차를 이용하여 복수의 항목의 데이터를 z-score로 변환하는 단계 및 복수의 항목의 데이터에 z-score을 입력하는 단계를 포함할 수 있다.According to still another aspect of the present invention, the step of generating processing data includes the steps of calculating an average and a standard deviation of data of a plurality of items included in an event, z-score, and inputting z-scores to the data of a plurality of items.

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 복수의 항목에 해당하는 각각의 단위를 추출하는 단계 및 각각의 단위를 가공 데이터에서 정의된 단위로 변환하는 단계를 포함할 수 있다.According to another aspect of the present invention, the step of generating the processing data may include extracting each unit corresponding to a plurality of items, and converting each unit into a unit defined in the processing data .

본 발명의 또 다른 특징에 따르면, 가공 데이터를 생성하는 단계는, 복수의 항목의 데이터 중 일부의 데이터만을 포함하도록 가공 데이터를 생성하는 단계를 포함할 수 있다.According to still another aspect of the present invention, the step of generating the processing data may include generating the processing data so as to include only a part of the data of the plurality of items.

본 발명의 또 다른 특징에 따르면, 질환 발병 확률을 산출하는 단계는, 질환이 발병될 확률 또는 질환의 종류에 따른 발병 확률 중 적어도 하나를 산출하는 단계일 수 있다.According to still another aspect of the present invention, the step of calculating the disease occurrence probability may be a step of calculating at least one of the probability of occurrence of the disease or the probability of occurrence according to the type of the disease.

본 발명의 또 다른 특징에 따르면, 질환 발병 예측 방법은 질환 발병 예측 모델을 이용하여 신체 나이 또는 기대 수명을 산출하는 단계를 더 포함할 수 있다.According to another aspect of the present invention, the method for predicting the onset of a disease may further include calculating a body age or an expected life using a disease onset prediction model.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 일 실시예에 따른 질환 발병 예측 장치는 적어도 하나의 외부 데이터베이스로부터 복수의 항목을 포함하는 원본 데이터를 수신하도록 구성된 통신부, 원본 데이터를 기초로 미리 결정된 기준에 따라 1회의 진료 또는 1회의 건강 검진을 하나의 이벤트로 나타내는 가공 데이터를 생성하도록 구성된 프로세서 및 원본 데이터 및 가공 데이터를 저장하는 저장부를 포함하고, 프로세서는, 가공 데이터를 질환 발병 예측 모델에 입력하고, 질환 발병 예측 모델을 이용하여 적어도 하나의 질환에 대한 질환 발병 확률을 산출하도록 구성된다.According to an aspect of the present invention, there is provided an apparatus for predicting the onset of disease according to an embodiment of the present invention. The apparatus includes a communication unit configured to receive original data including a plurality of items from at least one external database, And a storage configured to store original data and processed data, the processor configured to input the processed data to the disease onset prediction model , And is configured to calculate a disease occurrence probability for at least one disease using a disease occurrence prediction model.

본 발명의 다른 특징에 따르면, 통신부는, 사회학적 데이터, 적어도 1회의 진료를 포함하는 진료 기록 데이터 및 적어도 1회의 건강 검진을 포함하는 건강 검진 데이터 중 하나 이상을 수신하도록 구성될 수 있다.According to another aspect of the present invention, the communication unit may be configured to receive at least one of sociological data, medical record data including at least one medical examination, and health examination data including at least one medical examination.

본 발명의 또 다른 특징에 따르면, 프로세서는, 이벤트 중 결측된 이벤트가 존재하는지 판단하고, 결측된 이벤트가 존재하는 경우, 결측된 이벤트에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 생성하고, 대표값, 평균값 또는 보간값 중 적어도 하나를 결측된 이벤트에 입력하도록 구성될 수 있다.According to another aspect of the present invention, a processor is configured to determine whether a missing event in an event is present, to generate at least one of a representative value, an average value, or an interpolated value for a missing event if a missing event is present, To input at least one of a representative value, an average value, or an interpolation value to a missing event.

본 발명의 또 다른 특징에 따르면, 프로세서는, 이벤트에 포함된 복수의 항목에 결측된 데이터가 존재하는지 판단하고, 결측된 데이터가 존재하는 경우, 결측된 데이터에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 생성하고, 대표값, 평균값 또는 보간값 중 적어도 하나를 결측된 데이터에 입력하도록 구성될 수 있다.According to another aspect of the present invention, a processor determines whether there is data missing from a plurality of items included in an event, and when there is missing data, the processor extracts a representative value, an average value, Generate at least one, and input at least one of the representative value, the average value, or the interpolation value to the missing data.

본 발명의 또 다른 특징에 따르면, 프로세서는, 이벤트에 대한 길이의 빈도를 기초로 분포를 산출하고, 분포에서 미리 결정된 역치값에 해당하는 이벤트만을 포함하도록 가공 데이터를 생성하도록 구성되고, 역치값은, 분포의 중심을 기준으로 좌측으로부터 우측까지 95% 영역에 위치한 이벤트에 대한 길이일 수 있다.According to another aspect of the present invention, a processor is configured to calculate a distribution based on a frequency of a length for an event, and to generate processed data to include only events corresponding to a predetermined threshold value in the distribution, , And the length for the event located in the 95% region from left to right with respect to the center of the distribution.

본 발명의 또 다른 특징에 따르면, 프로세서는, 이벤트에 포함된 복수의 항목의 데이터에 대한 평균 및 표준편차를 계산하고, 평균 및 표준편차를 이용하여 복수의 항목의 데이터를 z-score로 변환하고, 복수의 항목의 데이터에 z-score을 입력하도록 구성될 수 있다.According to another aspect of the present invention, a processor calculates an average and a standard deviation of data of a plurality of items included in an event, converts data of a plurality of items into a z-score using an average and a standard deviation , And to input the z-score to the data of a plurality of items.

본 발명의 또 다른 특징에 따르면, 프로세서는, 복수의 항목에 해당하는 각각의 단위를 추출하고, 각각의 단위를 가공 데이터에서 정의된 단위로 변환하도록 구성될 수 있다.According to another aspect of the present invention, the processor may be configured to extract each unit corresponding to a plurality of items, and to convert each unit into a unit defined in the processing data.

기타 실시예의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.The details of other embodiments are included in the detailed description and drawings.

본 발명은 각각 다른 형태를 가지는 건강 관련 데이터를 하나의 이벤트로 나타냄으로써, 질환 발병 예측 모델에 다양한 데이터를 입력할 수 있는 질환 발병 예측 방법 및 장치를 제공할 수 있는 효과가 있다.The present invention has the effect of providing a method and apparatus for predicting disease outbreak that can input various data into a disease onset prediction model by representing health related data having different forms as one event.

본 발명은 수신한 건강 관련 데이터를 다양하게 가공하여 질환 발병 예측 모델에 입력함으로써, 질환 발병 확률의 정확성을 높일 수 있는 질환 발병 예측 방법 및 장치를 제공할 수 있는 효과가 있다.The present invention has an effect of providing a method and apparatus for predicting a disease outbreak that can increase the accuracy of a disease occurrence probability by variously processing received health data and inputting it into a disease onset prediction model.

본 발명에 따른 효과는 이상에서 예시된 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 명세서 내에 포함되어 있다.The effects according to the present invention are not limited by the contents exemplified above, and more various effects are included in the specification.

도 1은 본 발명의 일 실시예에 따른 질환 발병 확률을 예측하기 위한 방법을 설명하기 위한 개략도이다.
도 2는 본 발명의 일 실시예에 따른 질환 발병 예측 장치의 개략적인 구성을 도시한 블록도이다.
도 3은 본 발명의 일 실시예에 따른 질환 발병 예측 방법에 따라 질환 발병 확률을 산출하는 절차를 도시한 순서도이다.
도 4a 내지 도 4b는 본 발명의 일 실시예에 따라 하나의 진료 일자에 대한 하나의 이벤트로 통합한 가공 데이터 테이블을 도시한 개략도들이다.
도 5a 내지 도 5b는 본 발명의 일 실시예에 따라 결측된 이벤트를 산출하여 입력한 가공 데이터 테이블을 도시한 개략도들이다.
도 6a 내지 도 6b는 본 발명의 일 실시예에 따라 결측된 데이터를 산출하여 입력한 가공 데이터 테이블을 도시한 개략도들이다.
도 7a 내지 도 7b는 본 발명의 일 실시예에 따라 복수의 항목의 값을 정규화하여 입력한 가공 데이터 테이블을 도시한 개략도들이다.
도 8a 내지 도 8b는 본 발명의 일 실시예에 따라 복수의 항목의 값을 정의된 단위로 변환하여 입력한 가공 데이터 테이블을 도시한 개략도들이다.
도 9는 본 발명의 일 실시예에 따라 질병 발병 확률을 제공하는 화면을 도시한 것이다.
도 10a 내지 도 10b는 건강 소견 및 보험 가입 적합성을 제공하는 화면을 도시한 것이다.1 is a schematic diagram for explaining a method for predicting a disease occurrence probability according to an embodiment of the present invention.
2 is a block diagram showing a schematic configuration of an apparatus for predicting the onset of diseases according to an embodiment of the present invention.
3 is a flowchart illustrating a procedure for calculating a disease occurrence probability according to a disease occurrence prediction method according to an embodiment of the present invention.
4A and 4B are schematic diagrams showing a process data table integrated into one event for one medical care day according to an embodiment of the present invention.
5A and 5B are schematic diagrams illustrating a processed data table that is input by calculating a missing event according to an embodiment of the present invention.
FIGS. 6A and 6B are schematic diagrams illustrating a processed data table obtained by calculating missing data according to an embodiment of the present invention.
FIGS. 7A and 7B are schematic diagrams showing a processed data table input by normalizing values of a plurality of items according to an embodiment of the present invention. FIG.
8A and 8B are schematic diagrams showing a processed data table in which values of a plurality of items are converted into defined units according to an embodiment of the present invention.
FIG. 9 illustrates a screen for providing a disease incidence probability according to an embodiment of the present invention.
FIGS. 10A and 10B show a screen for providing health findings and suitability for insurance.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 발명의 실시예를 설명하기 위한 도면에 개시된 형상, 크기, 비율, 각도, 개수 등은 예시적인 것이므로 본 발명이 도시된 사항에 한정되는 것은 아니다. 또한, 본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명은 생략한다. 본 명세서 상에서 언급된 '포함한다', '갖는다', '이루어진다' 등이 사용되는 경우, '~만'이 사용되지 않는 이상 다른 부분이 추가될 수 있다. 구성요소를 단수로 표현한 경우에 특별히 명시적인 기재 사항이 없는 한 복수를 포함하는 경우를 포함한다.The shapes, sizes, ratios, angles, numbers, and the like disclosed in the drawings for describing the embodiments of the present invention are illustrative, and thus the present invention is not limited thereto. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Where the terms 'comprises', 'having', 'done', and the like are used herein, other parts may be added as long as '~ only' is not used. Unless the context clearly dictates otherwise, including the plural unless the context clearly dictates otherwise.

구성요소를 해석함에 있어서, 별도의 명시적 기재가 없더라도 오차 범위를 포함하는 것으로 해석한다.In interpreting the constituent elements, it is construed to include the error range even if there is no separate description.

비록 제1, 제2 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않는다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있다.Although the first, second, etc. are used to describe various components, these components are not limited by these terms. These terms are used only to distinguish one component from another. Therefore, the first component mentioned below may be the second component within the technical spirit of the present invention.

별도로 명시하지 않는 한 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Like reference numerals refer to like elements throughout the specification unless otherwise specified.

본 발명의 여러 실시예들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하며, 당업자가 충분히 이해할 수 있듯이 기술적으로 다양한 연동 및 구동이 가능하며, 각 실시예들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관 관계로 함께 실시 가능할 수도 있다.It is to be understood that each of the features of the various embodiments of the present invention may be combined or combined with each other partially or entirely and technically various interlocking and driving is possible as will be appreciated by those skilled in the art, It may be possible to cooperate with each other in association.

도 1 내지 도 8b에서는 설명의 편의를 위해 질환 발병 확률은 심혈관 질환의 발병 확률을 기준으로 설명되었으나, 이에 제한되지 않고, 심혈관 질환, 위암, 대장암, 간암, 폐암, 유방암, 전립선암, 치매 또는 당뇨의 발병 확률도 실질적으로 동일한 프로세스에 의해 예측될 수 있다.1 to 8B, the probability of the onset of the disease is described based on the probability of occurrence of cardiovascular disease, but the present invention is not limited thereto. The probability of the onset of diabetes can also be predicted by virtually the same process.

도 1은 본 발명의 일 실시예에 따른 질환 발병 확률을 예측하기 위한 방법을 설명하기 위한 개략도이다.1 is a schematic diagram for explaining a method for predicting a disease occurrence probability according to an embodiment of the present invention.

도 1을 참조하면, 질환 발병 확률 제공 시스템 (1000) 은 가공 데이터 (100) 를 질환 발병 예측 모델 (200) 에 입력하여 질환 발병 확률 (300) 을 산출하는 시스템이다.Referring to FIG. 1, the disease occurrence probability providing system 1000 is a system for calculating the disease occurrence probability 300 by inputting the processed data 100 into the disease occurrence prediction model 200.

가공 데이터 (100) 는 외부의 데이터베이스로부터 수신된 원본 데이터를 가공한 데이터로서, 미리 결정된 기준에 따라 원본 데이터를 통합하여 하나의 이벤트를 포함하도록 가공된다. 가공 데이터 (100) 는 적어도 하나의 이벤트를 포함한다. 이벤트는 질환 발병 확률과 연관된 의료 관련 활동으로 정의된다. 여기서, 질환은 심혈관 질환, 암, 치매 또는 당뇨일 수 있다. 예를 들어, 이벤트는 병원에서의 진료, 처방 또는 건강 검진으로 정의될 수 있다. 하나의 이벤트는 동일자의 진료와 처방을 포함할 수도 있다. 이 때, 가공 데이터 (100) 의 개수와 가공 데이터 (100) 에 포함된 이벤트의 개수는 제한되지 않는다.The processing data 100 is data obtained by processing original data received from an external database and is processed to include one event by integrating original data according to a predetermined criterion. The processing data 100 includes at least one event. An event is defined as a medical-related activity associated with the probability of disease outbreak. Here, the disease may be cardiovascular disease, cancer, dementia or diabetes. For example, an event can be defined as a clinic, prescription, or health checkup at a hospital. One event may include the care and prescription of the same person. At this time, the number of the processed data 100 and the number of events included in the processed data 100 are not limited.

질환 발병 예측 모델 (200) 은 입력된 데이터를 연산 처리하여, 결과값을 산출하기 위한 모델이다. 이 때, 입력된 데이터는 가공 데이터 (100) 이며, 결과값은 질환 발병 확률 (300) 일 수 있다. 질환 발병 예측 모델 (200) 은 복수의 가공 데이터 (100) 를 입력받을 수 있으며, 복수의 가공 데이터 (100) 각각에 해당하는 질환 발병 확률 (300) 을 산출할 수 있다. 더 나아가, 질환 발환 예측 모델 (200) 은 복수의 가공 데이터 (100) 를 연산 처리하여 복수의 가공 데이터 (100) 에 대한 하나의 질환 발병 확률 (300) 을 산출할 수 있다. The disease onset prediction model 200 is a model for computing the result of the input data and calculating the result. At this time, the input data is the processed data 100, and the resultant value may be the disease occurrence probability 300. The disease occurrence predicting model 200 can receive a plurality of the processed data 100 and calculate the disease occurrence probability 300 corresponding to each of the plurality of processed data 100. [ Furthermore, the disease occurrence prediction model 200 can calculate one disease occurrence probability 300 for a plurality of the processed data 100 by calculating a plurality of the processed data 100.

질환 발병 확률 (300) 은 질환이 발병할 확률에 대한 값으로, 질환 발병 예측 모델 (200) 에 의해 산출된다. 이 때, 질환 발병 확률 (300) 은 복수의 가공 데이터 (100) 각각에 해당하는 복수의 질환 발병 확률 (300) 및 복수의 가공 데이터 (100) 에 해당하는 하나의 질환 발병 확률 (300) 일 수 있다.The disease incidence probability (300) is a value for the probability of disease incidence, and is calculated by the disease incidence prediction model (200). At this time, the disease occurrence probability 300 may be a plurality of disease occurrence probability 300 corresponding to each of the plurality of processed data 100 and a disease occurrence probability 300 corresponding to the plurality of processed data 100 have.

이하에서는, 질환 발병 예측 모델을 구현하는 질환 발병 확률 예측 장치 (400) 에서의 질환 발병 예측 방법에 대한 보다 상세한 설명을 위해 도 2를 함께 참조한다.Hereinafter, FIG. 2 will be referred to for a more detailed description of the disease onset prediction method in the disease occurrence probability prediction apparatus 400 implementing the disease onset prediction model.

도 2는 본 발명의 일 실시예에 따른 질환 발병 확률 예측 장치의 개략적인 구성을 도시한 블록도이다. 설명의 편의를 위해 도 1을 참조하여 설명한다.2 is a block diagram showing a schematic configuration of a disease occurrence probability prediction apparatus according to an embodiment of the present invention. Will be described with reference to Fig. 1 for convenience of explanation.

도 2를 참조하면, 질환 발병 확률 예측 장치 (200) 는 통신부 (210), 프로세서 (220) 및 저장부 (230) 를 포함한다.Referring to FIG. 2, the disease occurrence probability prediction apparatus 200 includes a communication unit 210, a processor 220, and a storage unit 230.

질환 발병 확률 예측 장치 (400) 의 통신부 (210) 는 적어도 하나의 외부 데이터베이스로부터 복수의 항목을 포함하는 원본 데이터를 수신하도록 구성된다. 여기서, 외부 데이터란, 건강보험공단의 건강 검진 코호트 데이터베이스, 진료기관의 진료 데이터베이스의 데이터일 수 있다. 건강 검진 코호트 데이터베이스는 건강 보험 및 의료급여권자 전체에 대한 진료 명세서와 치료 내역, 상병 내역, 처방전 내역 등에 대한 데이터를 포함한다. 또한, 통신부 (210) 는 산출된 질환 발병 확률을 의료 기관, 보험사 및 개인에게 제공할 수 있다.The communication unit 210 of the disease occurrence probability prediction apparatus 400 is configured to receive original data including a plurality of items from at least one external database. Here, the external data may be the data of the health examination cohort database of the health insurance corporation, and the medical examination database of the medical institution. The health check-up cohort database contains data on the health insurance and medical claimant's entire medical treatment and treatment history, infectious disease history, and prescription history. In addition, the communication unit 210 can provide the calculated disease occurrence probability to the medical institution, the insurance company, and the individual.

질환 발병 확률 예측 장치 (400) 의 프로세서 (420) 는 원본 데이터를 기초로 미리 결정된 기준에 따라 1회의 진료 또는 1회의 건강 검진을 하나의 이벤트로 나타내는 가공 데이터를 생성하도록 구성된다. 이 때, 프로세서 (420) 는 산출할 질환 발병 확률의 정확도를 높이기 위해 가공 데이터를 생성한다. 구체적으로, 프로세서 (420) 는 복수의 이벤트 중 결측된 이벤트가 존재하는 경우, 결측된 이벤트를 생성할 수도 있고, 이벤트에 포함된 항목에 결측된 데이터가 존재하는 경우에도 결측된 데이터를 생성할 수 있다. 더 나아가, 프로세서 (420) 는 이벤트에 대한 길이의 빈도를 기초로 분포를 산출하고, 분포에서 미리 결정된 역치값에 해당하는 이벤트만을 포함하도록 가공 데이터를 생성한다. 이 때, 역치값은 분포의 중심을 기준으로 좌측으로부터 우측까지 95% 영역에 위치한 이벤트에 대한 길이이다. 또한, 프로세서 (420) 는 복수의 항목에 해당하는 각각의 단위를 추출하고 각각의 단위를 가공 데이터에서 정의된 단위로 변환한다. 더 나아가, 프로세서 (420) 는 가공 데이터를 질환 발병 예측 모델에 입력하고, 질환 발병 예측 모델을 이용하여 질환 발병 확률을 산출한다. The processor 420 of the disease occurrence probability predicting apparatus 400 is configured to generate processing data indicating one treatment or one health examination as one event in accordance with a predetermined criterion based on the original data. At this time, the processor 420 generates the processing data to increase the accuracy of the disease occurrence probability to be calculated. In particular, processor 420 may generate a missing event if there is a missing event of a plurality of events, or may generate missing data if there is missing data in an item included in the event have. Further, the processor 420 calculates the distribution based on the frequency of the length for the event, and generates the processed data so as to include only the event corresponding to the predetermined threshold value in the distribution. In this case, the threshold value is the length of the event located in the 95% region from the left to the right with respect to the center of the distribution. In addition, the processor 420 extracts each unit corresponding to a plurality of items, and converts each unit into a unit defined in the processed data. Further, the processor 420 inputs the processed data to the disease onset prediction model, and calculates the disease onset probability using the disease onset prediction model.

질환 발병 확률 예측 장치 (400) 의 저장부 (430) 는 수신한 데이터 및 생성된 데이터를 저장한다. 구체적으로, 저장부 (430) 는 외부 데이터베이스로부터 수신한 원본 데이터 및 원본 데이터를 기초로 생성한 가공 데이터를 저장하며 더 나아가, 산출한 질환 발병 확률을 저장한다.The storage unit 430 of the disease occurrence probability prediction apparatus 400 stores the received data and the generated data. More specifically, the storage unit 430 stores the processed data generated based on the original data and the original data received from the external database, and further stores the calculated disease occurrence probability.

이하에서는 질환 발병 확률 예측 장치 (400) 에서의 질환 발병 예측 방법에 대한 보다 상세한 설명을 위해 도 3을 함께 참조한다.Hereinafter, FIG. 3 will be referred to for a more detailed description of the disease onset prediction method in the disease occurrence probability predicting apparatus 400. FIG.

도 3은 본 발명의 일 실시예에 따른 질환 발병 예측 방법에 따라 질환 발병 확률을 산출하는 절차를 도시한 순서도이다. 설명의 편의를 위해 도 1 및 도 2의 구성 요소들과 도면 부호를 참조하여 설명한다.3 is a flowchart illustrating a procedure for calculating a disease occurrence probability according to a disease occurrence prediction method according to an embodiment of the present invention. For convenience of explanation, the components will be described with reference to FIG. 1 and FIG. 2 and reference numerals.

질환 발병 확률 예측 장치 (400) 의 통신부 (410) 는 적어도 하나의 외부 데이터베이스로부터 복수의 항목을 포함하는 원본 데이터를 수신한다 (S310).The communication unit 410 of the disease occurrence probability prediction apparatus 400 receives original data including a plurality of items from at least one external database (S310).

구체적으로, 통신부 (410) 는 사회학적 데이터, 적어도 1회의 진료를 포함하는 진료 기록 데이터 및 적어도 1회의 건강 검진을 포함하는 건강 검진 데이터 중 하나 이상을 수신한다. 여기서, 사회학적 데이터는 건강 보험 가입자 및 의료 급여 수급권자의 건강 보장 자격 정보로, 성, 연령, 거주 지역과 같은 인구 사회학적 정보, 사망일자, 사망원인을 포함하는 사망관련 정보, 건강보험 가입 여부, 의료급여 지급 여부와 같은 건강보장 유형 및 소득 분위 및 장애 등록 정보를 포함하는 사회 경제적 수준 및 기타 정보를 포함한다. 또한, 진료 기록 데이터는 요양 급여 비용 명세서 상의 의료 이용 내역 및 의료비 발생 내역을 의미한다. 진료 기록 데이터는 의료 기관 이용 정보, 요양 급여 비용, 진료 과목, 진료 상병 정보, 진찰, 처치, 수술, 기타 행위 급여 내역, 치료 재료 등의 상세 진료 내역을 포함한다. 구체적인 원본 데이터의 특징, 외부 데이터베이스에서의 필드명은 표 1과 같다.Specifically, the communication unit 410 receives at least one of sociological data, medical record data including at least one medical examination, and health examination data including at least one medical examination. Here, the sociological data is the health insurance qualification information of the health insurance subscribers and the medical benefit beneficiaries. It includes demographic information such as sex, age, residence area, death date, death related information including the cause of death, Economic status and other information, including types of health care, such as whether health care payments are made, and income quintiles and disability registration information. In addition, the medical record data refers to medical use history and medical fee occurrence details on the medical care benefit cost statement. The medical record data includes the details of medical treatment such as medical institution use information, medical care cost, medical treatment subject, medical illness information, medical examination, treatment, operation, other behavioral benefit details, and therapeutic materials. Table 1 shows the characteristics of the original data and field names in the external database.

더 나아가, 원본 데이터는 외부 데이터베이스 중 건강검진코호트 데이터베이스에서 질환 혹은 암의 과거력이 없는 80세 미만의 데이터만 사용한다. 다양한 원본 데이터를 수신하기 때문에, 지역, 문화적인 특징, 그리고 시대에 따라 차이가 나는 환경적인 요인으로 인한 질환 발병 예측 정확도가 떨어지는 문제를 추가적인 데이터 수집, 지역별 복수의 질환 예측 모델을 생성하는 방법 등으로 보완할 수 있는 장점이 있다.Furthermore, the original data is used only in data of less than 80 years old, which does not have a history of disease or cancer, in the health examination cohort database of external databases. Because it receives a variety of original data, the problem of low localization, cultural characteristics, and environmental factors that are different from each other depending on the age is less accurate. There are advantages to be able to supplement.

이어서, 프로세서 (420) 는 원본 데이터를 기초로 미리 결정된 기준에 따라 1회의 진료 또는 1회의 건강 검진을 하나의 이벤트로 나타내는 가공 데이터를 생성한다 (S320).Subsequently, the processor 420 generates processing data indicating one medical examination or one medical examination as one event in accordance with a predetermined criterion based on the original data (S320).

구체적으로, 프로세서 (420) 는 원본 데이터에 포함된 복수의 항목들을 1회의 진료 또는 1회의 건강 검진을 기준으로 하나의 이벤트로 구성하여 미리 결정된 기준에 따라 가공 데이터를 생성한다. 예를 들어, 프로세서 (420) 는 개인 일련 번호, 복용 약품 분류 코드, 복용 약품 투약량 등의 항목을 하루의 요양 개시 일자, 즉 1회의 진료 또는 1회의 건강 검진에 따라 분류함으로써 하나의 이벤트로 구성하여 미리 결정된 기준에 따라 가공 데이터를 생성한다. 하나의 이벤트는 복용 약품 분류 코드 및 복용 투약량에 대한 데이터를 포함한다. 이 때, 프로세서 (420) 는 원본 데이터에 포함된 복수의 항목 중에서 질환 발병과 연관된 항목을 필터링한다. 예를 들어, 프로세서 (420) 는 질환과 연관이 있는 복용 약품 분류 코드 및 복용 약품 투약량에 해당하는 항목을 필터링할 수 있다. 이 때, 질환 발병과 연관된 항목은 적어도 50개이다. Specifically, the processor 420 constructs a plurality of items included in the original data as one event based on one medical examination or one medical examination, and generates processed data according to a predetermined criterion. For example, the processor 420 may classify items such as a personal serial number, a drug classification code, and a medication dose into one event by classifying the items according to the day of the day of the day of medical treatment, that is, one medical examination or one medical examination And generates processing data in accordance with a predetermined criterion. One event includes data on the dosing drug classification code and dosing dose. At this time, the processor 420 filters items related to the disease outbreak among a plurality of items included in the original data. For example, the processor 420 may filter items corresponding to a drug classifier code and a drug dosage amount associated with the disease. At this time, there are at least 50 items associated with the onset of the disease.

또한, 다른 실시예에서, 하나의 진료 일자에 대해 복수의 원본 데이터가 존재하는 경우, 프로세서 (420) 는 원본 데이터를 하나의 진료 일자에 대한 하나의 이벤트로 통합할 수 있다. 예를 들어, 하나의 진료 일자에 복수의 복용 약품 분류 코드 및 복수의 복용 약품 분류 코드 각각에 대응하는 각각의 복용 투약량이 존재하는 경우, 프로세서 (420) 는 복수의 복용 약품 분류 코드 및 복용 투약량을 하나의 진료 일자에 해당하는 하나의 이벤트로 통합할 수 있다.Further, in another embodiment, if there is a plurality of source data for one medical care day, the processor 420 may combine the source data into one event for one medical care day. For example, when there is a plurality of dosing dosage codes corresponding to each of the plurality of dosing drug classification codes and a plurality of dosing drug classification codes at a single medical care day, the processor 420 calculates a plurality of dosing drug classification codes and dosage amounts It can be integrated into one event corresponding to one medical care day.

한편, 또 다른 실시예에서, 프로세서 (420) 는 복수의 이벤트 중 결측된 이벤트가 존재하는지 판단한다. 결측된 이벤트가 존재하는 경우, 프로세서 (420) 는 결측된 이벤트에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 생성하고, 대표값, 평균값 또는 보간값 중 적어도 하나를 이벤트에 입력한다. 예를 들어, 프로세서 (420) 는 진료 일자가 2003년, 2005년, 2009년에 해당하는 건강 검진 즉, 3회의 이벤트가 존재하는 경우, 2004년 2006년, 2007년, 2008년에 해당하는 이벤트를 결측된 이벤트라고 판단한다. 따라서, 프로세서 (420) 는 2004년, 2006년, 2007년, 2008년에 대한 이벤트에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 생성한다. 구체적으로, 프로세서 (420) 는 2003년, 2005년, 2009년의 이벤트에 포함된 항목 예를 들어, 나이, BMI, 혈압을 이용하여 나이, BMI, 혈압에 대한 대표값, 평균값 또는 보간값 중 적어도 하나를 생성할 수 있다. 이어서, 프로세서 (420) 는 생성한 대표값, 평균값 또는 보간값 중 적어도 하나를 2004년, 2006년, 2007년, 2008년에 해당하는 이벤트의 나이, BMI, 혈압 항목에 입력한다.Meanwhile, in another embodiment, the processor 420 determines whether there is a missing event among a plurality of events. If a missing event is present, the processor 420 generates at least one of a representative value, an average value, or an interpolated value for the missing event, and inputs at least one of the representative value, the average value, or the interpolated value to the event. For example, the processor 420 may perform a health checkup corresponding to 2003, 2005, or 2009, i.e., if there are three events, the event corresponding to 2004, 2006, 2007, and 2008 It is determined that the event is a missed event. Thus, the processor 420 generates at least one of a representative value, an average value, or an interpolated value for events for 2004, 2006, 2007, and 2008. Specifically, the processor 420 determines at least one of the representative values, average values, or interpolated values for the age, BMI, and blood pressure using the items included in the events of 2003, 2005, and 2009, One can be created. Then, the processor 420 inputs at least one of the generated representative value, the average value, or the interpolated value to the age, BMI, blood pressure item of the event corresponding to 2004, 2006, 2007, and 2008.

다양한 실시예에서, 프로세서 (420) 는 이벤트에 포함된 항목에 결측된 데이터가 존재하는지 판단한다. 결측된 데이터가 존재하는 경우, 프로세서 (420) 는 결측된 데이터에 대한 대표값, 평균값 또는 보간값 중 적어도 하나를 생성한다. 예를 들어, 질환자의 2004년, 2005년, 2006년의 이벤트에 포함된 항목 중 2006년의 이벤트에 키에 대한 데이터가 결측되었다고 판단한 경우, 프로세서 (420) 는 2004년과 2005년의 이벤트의 키에 대한 데이터를 이용하여 대표값, 평균값 또는 보간값 중 적어도 하나를 생성한다. 이어서, 프로세서 (420) 는 생성한 대표값, 평균값 또는 보간값 중 적어도 하나를 2004년과 2005년의 이벤트의 키에 대한 항목에 입력한다.In various embodiments, the processor 420 determines whether there is data missing in the item included in the event. If missing data is present, the processor 420 generates at least one of a representative value, an average value, or an interpolated value for the missing data. For example, if it is determined that the data for the key in the 2006 event among the items included in the event of the patient in 2004, 2005 and 2006 has been lost, the processor 420 determines the key of the event in 2004 and 2005 And generates at least one of a representative value, an average value, and an interpolation value. Subsequently, the processor 420 inputs at least one of the generated representative value, the average value, or the interpolated value in the item for the key of the event of 2004 and 2005.

한편, 다양한 실시예에서, 프로세서 (420) 는 이벤트에 대한 길이의 빈도를 기초로 분포를 산출하고, 분포에서 미리 결정된 역치값에 해당하는 이벤트만을 포함하도록 가공 데이터를 생성한다. 이 때, 역치값은 분포의 중심을 기준으로 좌측으로부터 우측까지 95% 영역에 위치한 이벤트에 대한 길이이다. 이벤트 수가 많아서 이벤트 길이의 분포가 높은 경우, 시간에 대한 정밀도는 높아진다. 시간에 대한 정밀도가 높아지면, 가공 데이터의 규모가 커지고, 질환 발병 확률에 큰 영향을 미치기 때문에, 날짜 분포도에 따라 이벤트의 수를 조절해야할 수 있다.Meanwhile, in various embodiments, the processor 420 calculates the distribution based on the frequency of the length for the event, and generates the processed data to include only events corresponding to a predetermined threshold value in the distribution. In this case, the threshold value is the length of the event located in the 95% region from the left to the right with respect to the center of the distribution. When the number of events is large and the distribution of the event length is high, the accuracy with respect to time increases. As the accuracy of the time increases, the number of events may need to be adjusted according to the date distribution, since the scale of the processed data becomes larger and affects the probability of disease onset.

또한, 다른 실시예에서, 프로세서 (420) 는 이벤트에 포함된 복수의 항목의 데이터에 대한 평균 및 표준편차를 계산한다. 이어서, 프로세서 (420) 는 계산한 평균 및 표준편차를 이용하여 복수의 항목의 데이터를 z-score로 변환하여 복수의 항목의 데이터에 입력한다. 이벤트에 포함된 복수의 항목의 데이터를 z-score로 변환하여 입력함으로써, 프로세서 (420) 는 각 항목에 대한 데이터를 정규화할 수 있다.Further, in another embodiment, the processor 420 calculates the mean and standard deviation for the data of the plurality of items included in the event. Then, the processor 420 converts the data of a plurality of items into z-scores using the calculated average and standard deviation, and inputs the z-scores into the data of the plurality of items. By converting the data of a plurality of items included in the event into z-scores and inputting them, the processor 420 can normalize data for each item.

또 다른 실시예에서, 프로세서 (420) 는 복수의 항목에 해당하는 각각의 단위를 추출한다. 예를 들어, 프로세서 (420) 는 키 및 몸무게의 단위인 m와 kg을 추출한다. 이어서, 프로세서 (420) 는 각각의 단위를 가공 데이터에서 정의된 단위로 변환한다. 예를 들어, 가공 데이터에서 정의된 단위가 ft와 lb인 경우, 프로세서 (420) 는 키 및 몸무게 항목에 해당하는 단위를 m에서 ft로, kg에서 lb로 변환한다. 즉, 프로세서 (420) 는 복수의 항목에 해당하는 단위를 변환함으로써, 하나의 항목에 대해 각각 다른 경우에 단위를 통일할 수 있다.In another embodiment, the processor 420 extracts each unit corresponding to a plurality of items. For example, processor 420 extracts m and kg, which are units of key and weight. Subsequently, the processor 420 converts each unit into a unit defined in the process data. For example, if the units defined in the machining data are ft and lb, the processor 420 converts the units corresponding to the key and weight items from m to ft, kg to lb. That is, the processor 420 may convert the units corresponding to a plurality of items, thereby unifying the units in different cases for one item.

이어서, 프로세서 (420) 는 가공 데이터를 질환 발병 예측 모델에 입력한다 (S330).Subsequently, the processor 420 inputs the processed data to the disease onset prediction model (S330).

이 때, 프로세서 (420) 는 적어도 하나의 가공 데이터를 질환 발병 확률을 산출하기 위한 알고리즘인 질환 발병 예측 모델에 입력한다. 가공 데이터는 복수의 이벤트를 포함할 수 있다.At this time, the processor 420 inputs the at least one processed data to the disease onset prediction model, which is an algorithm for calculating the disease onset probability. The processed data may include a plurality of events.

이어서, 프로세서 (420) 는 질환 발병 예측 모델을 이용하여 질환 발병 확률을 산출한다 (S340).Next, the processor 420 calculates a disease occurrence probability using the disease occurrence prediction model (S340).

여기서, 질환 발병 예측 모델은 입력된 가공 데이터를 머신 러닝에 의해 학습되고, 학습의 결과로 결정된 파라미터들을 적용하여 질환 발병 확률을 산출한다. 이 때, 프로세서 (420) 는 가공 데이터에 포함된 복수의 이벤트 각각에 대한 질환 발병 확률을 산출할 수도 있고, 가공 데이터에 포함된 복수의 이벤트에 대해 통합한 하나의 질환 발병 확률을 산출할 수 있다. 더 나아가, 프로세서 (420) 는 질환의 종류에 따른 발병 확률도 산출할 수 있다. 즉, 프로세서 (420) 는 고혈압, 협심증, 심근경색증, 뇌졸중, 위암, 대장암, 폐암, 유방암, 전립선암, 치매, 당뇨 등에 걸릴 확률 또는 고혈압, 협심증, 심근경색증, 뇌졸중, 위암, 대장암, 폐암, 유방암, 전립선암, 치매, 당뇨 등 각각에 걸릴 확률 중 적어도 하나를 산출한다. 각각의 질환에 대해서 별도의 질환 발병 예측 모델이 생성되고 사용될 수 있다. 각각의 질환에 대한 별도의 질환 발병 예측 모델은 제한되지 않은 방식에 의해 머신 러닝되어 생성될 수 있다. 또한, 하나의 질환 발병 예측 모델이 다수의 질환 발병 확률을 산출하도록 구현될 수 있다. 더 나아가, 복수의 질환 발병 예측 모델이 하나의 질환 발병 확률을 산출하도록 구현될 수 있다.산출된 질환이 발병될 확률 또는 질환의 종류에 따른 발병 확률은 개인, 보험사, 의료기관, 건강보험공단 등에 제공될 수 있다.Here, the disease occurrence prediction model learns the input processing data by machine learning, and calculates the disease occurrence probability by applying the parameters determined as a result of the learning. At this time, the processor 420 may calculate a disease occurrence probability for each of a plurality of events included in the processed data, and may calculate a single disease occurrence probability integrated for a plurality of events included in the processed data . Furthermore, the processor 420 may also calculate the probability of onset according to the type of disease. In other words, the processor 420 may be used to determine the probability of suffering from hypertension, angina pectoris, myocardial infarction, stroke, stomach cancer, colon cancer, lung cancer, breast cancer, prostate cancer, dementia, diabetes or hypertension, angina pectoris, myocardial infarction, stroke, gastric cancer, , Breast cancer, prostate cancer, dementia, diabetes, and the like. A separate disease onset prediction model can be generated and used for each disease. A separate disease onset prediction model for each disease can be generated by an unrestricted manner and machine running. In addition, one disease incidence prediction model can be implemented to calculate multiple disease incidence probabilities. Further, the multiple disease disease prediction model can be implemented so as to calculate the probability of developing a disease. The probability of occurrence of the disease caused by the disease or the type of disease is provided to individuals, insurance companies, medical institutions, and health insurance corporations .

이에 따라, 질환 발병 확률 예측 장치 (400) 는 원본 데이터를 가공한 가공 데이터를 질환 발병 모델에 입력함으로써, 다양한 조건을 고려한 가공 데이터를 기초로 정확도가 높은 질환 발병 확률을 산출할 수 있다. Accordingly, the disease occurrence probability predicting device 400 can calculate the disease occurrence probability with high accuracy based on the processed data considering various conditions by inputting the processed data obtained by processing the original data into the disease disease occurrence model.

도 4a 내지 도 4b는 본 발명의 일 실시예에 따라 하나의 진료 일자에 대한 하나의 이벤트로 통합한 가공 데이터 테이블을 도시한 것이다.4A and 4B illustrate a processed data table integrated into one event for one medical care day according to an embodiment of the present invention.

도 4a를 참조하면, 원본 데이터 테이블 (510) 은 하나의 진료 일자 (511, 512) 에 대한 복수의 이벤트를 포함한다. 예를 들어, 원본 데이터 테이블 (510) 은 2002년 12월 07일에 해당하는 진료 일자 (511) 에 대한 2가지의 복용 악품 분류 코드 (521) 및 복용 약품 투약량 (531) 을 포함한다. 따라서, 원본 데이터 테이블 (510) 은 A043016, A054502 인 복용 약품 분류 코드 (521) 에 따라 2002년 12월 07일인 진료 일자 (511) 에 해당하는 2개의 행을 포함한다. 이 때, 2002년 12월 07인 진료 일자 (511) 에 해당하는 행에는 복용 약품 투약량 (531) 도 포함된다. 마찬가지로, 원본 데이터 테이블 (510) 은 A166503, A037008 인 복용 약품 분류 코드 (522) 에 따라 2002년 12월 21일인 진료 일자 (512) 에 해당하는 2개의 행을 포함한다. 이 때, 2002년 12월 21일인 진료 일자 (512) 에 해당하는 행에는 복용 약품 투약량 (532) 도 포함된다.Referring to FIG. 4A, the original data table 510 includes a plurality of events for one treatment date 511, 512. For example, the original data table 510 includes two dosing goods classification codes 521 and dosage medication quantity 531 for the medical care date 511 corresponding to December 07, 2002. Therefore, the original data table 510 includes two rows corresponding to the medical care date 511 of December 07, 2002 in accordance with A043016 and A054502 drug classification code 521. At this time, the dosing medication dose 531 is included in the row corresponding to the medical care date 511 on December 07, 2002. Similarly, the original data table 510 includes two rows corresponding to the medical care date 512, which is December 21, 2002, in accordance with drug classification code 522 for A166503, A037008. At this time, the dosing medication dose 532 is also included in the row corresponding to the medical care date 512 (December 21, 2002).

도 4b를 참조하면, 가공 데이터 테이블 (520) 은 하나의 진료 일자에 대한 하나의 이벤트를 포함한다. 예를 들어, 가공 데이터 테이블 (520) 은 하나의 행에 진료 일자에 대한 데이터 즉, 복용 약품 분류 코드 각각에 해당하는 복용 약품 투약량을 포함한다. 구체적으로, 가공 데이터 테이블 (520) 은 하나의 진료 일자인 2002년 12월 07일의 진료 일자 (511) 에 복용 약품 분류 코드 (521) 와 복용 약품 투약량 (531) 을 포함한다. 또한, 가공 데이터 테이블 (520) 은 2002년 12월 21일의 진료 일자 (512) 에 복용 약품 분류 코드 (522) 및 복용 약품 투약량 (532) 을 포함한다. 즉, 가공 데이터 테이블 (520) 은 하나의 진료 일자에 해당하는 복수의 이벤트를 통합한 하나의 이벤트에 대한 행을 포함한다.Referring to FIG. 4B, the process data table 520 includes one event for one treatment date. For example, the processing data table 520 includes data on the date of care, that is, dosage medicines corresponding to each of the dosing drug classification codes, in one row. Specifically, the processing data table 520 includes an administration medication classification code 521 and an administration medication dosage 531 on a medical treatment date 511 of December 07, 2002, which is a medical treatment date. The processing data table 520 includes an administration medication classification code 522 and an administration medication dosage 532 on the medical examination date 512 of December 21, That is, the processing data table 520 includes a row for one event that combines a plurality of events corresponding to one medical care date.

이에 따라, 질환 발병 확률 예측 장치 (400) 는 하나의 진료 일자에 대한 복수의 원본 데이터를 통합하여 하나의 진료 일자에 대한 하나의 이벤트로 가공 데이터를 생성함으로써, 하나의 진료 일자에 해당하는 복수의 특징 예를 들어, 복용 약품 분류 코드, 복용 약품 투약량을 하나의 이벤트로 표현할 수 있다.Accordingly, the disease occurrence probability predicting device 400 integrates a plurality of original data for one medical care day to generate processed data as one event for one medical care date, so that a plurality of Features For example, the dosing drug classification code and dosage medication dosage can be expressed as one event.

도 5a 내지 도 5b는 본 발명의 일 실시예에 따라 결측된 이벤트를 산출하여 입력한 가공 데이터 테이블을 도시한 것이다.FIGS. 5A and 5B are diagrams illustrating a processed data table that is input by calculating a missing event according to an embodiment of the present invention.

도 5a를 참조하면, 원본 데이터 테이블 (610) 은 개인 일련 번호에 따른 년도별 나이, 혈당, BMI 등의 이벤트 (611, 612, 613) 를 포함한다. 예를 들어, 원본 데이터 테이블 (610) 은 동일한 개인 일련 번호의 2003년 이벤트 (611), 2005년 이벤트 (612) 및 2009년에 이벤트 (613) 를 포함한다. Referring to FIG. 5A, the original data table 610 includes events 611, 612, and 613 such as age, blood sugar, and BMI according to individual serial numbers. For example, the original data table 610 includes 2003 events 611, 2005 events 612, and 2009 events 613 of the same private serial number.

도 5b를 참조하면, 가공 데이터 테이블 (620) 은 2003년 이벤트 (611), 2005년 이벤트 (612) 및 2009년 이벤트 (613) 를 기초로 생성된 결측 이벤트 (621) 를 포함한다. 예를 들어, 가공 데이터 (620) 는 2004년, 2006년, 2007년, 2008년에 해당하는 결측 이벤트 (621) 를 포함한다. 이 때, 2004년, 2006년, 2007년, 2008년에 해당하는 결측 이벤트 (621) 는 2003년 이벤트 (611), 2005년 이벤트 (612) 및 2009년 이벤트 (613) 의 나이, 혈당, BMI를 기초로 생성된 대표값, 평균값 또는 보간값 중 적어도 하나로 구성된다.Referring to FIG. 5B, the machining data table 620 includes a missing event 621 generated based on the 2003 event 611, the 2005 event 612, and the 2009 event 613. For example, processing data 620 includes a missing event 621 corresponding to 2004, 2006, 2007, and 2008. At this time, the missing event 621 corresponding to 2004, 2006, 2007, and 2008 indicates the age, blood glucose, and BMI of the 2003 event 611, the 2005 event 612, and the 2009 event 613 An average value, or an interpolation value generated based on the representative value, the average value, or the interpolation value.

이에 따라, 질환 발병 확률 예측 장치 (400) 는 결측된 이벤트에 대해서 대표값, 평균값 또는 보간값 중 적어도 하나를 입력하여 가공 데이터를 생성함으로써, 질환 발병 예측 모델에 입력할 데이터를 확장하여 질환 발병 확률의 정확도를 높일 수 있다.Accordingly, the disease occurrence probability prediction apparatus 400 generates at least one of a representative value, an average value, and an interpolation value with respect to a missed event to generate processed data, thereby expanding the data to be input to the disease occurrence prediction model, Can be increased.

도 6a 내지 도 6b는 본 발명의 일 실시예에 따라 결측된 데이터를 산출하여 입력한 가공 데이터 테이블을 도시한 것이다.FIGS. 6A and 6B show a processed data table that is calculated by calculating missing data according to an embodiment of the present invention.

도 6a를 참조하면, 원본 데이터 테이블 (710) 는 하나의 개인 일련 번호에 따른 복수의 이벤트에 대한 데이터를 포함한다. 이 때, 복수의 이벤트는 복수의 항목을 포함하는데, 복수의 항목에 대응하는 데이터에 결측 데이터 (711) 가 존재할 수 있다. 따라서, 원본 데이터 테이블 (710) 는 하나의 개인 일련 번호에 따른 복수의 항목의 데이터를 기초로 생성된 결측 데이터 (711) 를 입력받을 수 있다. 결측 데이터 (711) 는 하나의 개인 일련 번호에 따른 복수의 항목의 데이터를 기초로 생성된 대표값, 평균값 또는 보간값 중 적어도 하나이다.Referring to FIG. 6A, the original data table 710 includes data for a plurality of events according to one personal serial number. At this time, the plurality of events include a plurality of items, and the missing data 711 may exist in the data corresponding to the plurality of items. Therefore, the original data table 710 can receive the missing data 711 generated based on the data of a plurality of items according to one personal serial number. The missing data 711 is at least one of a representative value, an average value, and an interpolation value generated based on data of a plurality of items based on one personal serial number.

도 6b를 참조하면, 가공 데이터 테이블 (720) 는 복수의 개인 일련 번호에 따른 복수의 이벤트에 대한 데이터를 포함한다. 이 때, 복수의 이벤트에 포함된 복수의 항목에 대응하는 데이터에 결측 데이터 (721) 가 존재할 수 있다. 따라서, 가공 데이터 테이블 (720) 는 복수의 개인 일련 번호에 따른 복수의 항목의 데이터를 기초로 생성된 결측 데이터 (721) 를 입력받을 수 있다. 즉, 가공 데이터 테이블 (720) 는 복수의 타인의 데이터를 기초로 생성된 대표값, 평균값 또는 보간값 중 적어도 하나로 결측 데이터 (721) 를 입력받을 수 있다.Referring to FIG. 6B, the processing data table 720 includes data for a plurality of events according to a plurality of personal serial numbers. At this time, the missing data 721 may exist in the data corresponding to the plurality of items included in the plurality of events. Therefore, the machining data table 720 can receive the missing data 721 generated based on the data of a plurality of items according to the plurality of personal serial numbers. That is, the machining data table 720 can receive the missing data 721 in at least one of a representative value, an average value, or an interpolation value generated based on the data of a plurality of others.

이에 따라, 질환 발병 확률 예측 장치 (400) 는 개인의 데이터 또는 타인의 데이터를 기초로 결측된 데이터에 대표값, 평균값 또는 보간값 중 적어도 하나를 입력하여 가공 데이터를 생성함으로써, 질환 발병 예측 모델에 입력할 데이터를 확장하여 질환 발병 확률의 정확도를 높일 수 있다.Accordingly, the disease occurrence probability predicting device 400 generates at least one of a representative value, an average value, and an interpolation value to the data missing based on the individual's data or the other's data, thereby generating the processed data, The data to be input can be expanded to increase the accuracy of the disease occurrence probability.

도 7a 내지 도 7b는 본 발명의 일 실시예에 따라 복수의 항목의 값을 정규화하여 입력한 가공 데이터 테이블을 도시한 것이다.FIGS. 7A and 7B illustrate a processed data table in which values of a plurality of items are normalized and input according to an embodiment of the present invention.

도 7a를 참조하면, 원본 데이터 테이블 (810) 은 개인 일련 번호에 따른 복수의 이벤트를 포함한다. 이 때, 복수의 이벤트는 BMI, 수축기 혈압, 이완기 혈압과 같은 복수의 항목을 포함하며, 복수의 항목은 각각 다른 단위의 수치값으로 입력되어 있다. 예를 들어, BMI는 kg/m^2,수축기 혈압과 이완기 혈압은 mmHg에 해당하는 수치값으로 입력되어 있다.Referring to FIG. 7A, the original data table 810 includes a plurality of events according to a personal serial number. At this time, a plurality of events include a plurality of items such as BMI, systolic blood pressure, and diastolic blood pressure, and a plurality of items are inputted as numerical values of different units. For example, BMI is entered in kg / m ^2, systolic and diastolic blood pressure values are given in mmHg.

도 7b를 참조하면, 가공 데이터 테이블 (820) 은 복수의 항목에 z-score로 변환된 수치값을 포함한다. 이 때, z-score로 변환된 값은 각각 다른 단위의 수치값의 평균 및 표준편차에서 산출된다. 즉, 가공 데이터 테이블 (820) 은 복수의 항목에 해당하는 각각 다른 단위의 수치값을 하나의 단위로 적용한 것과 같은 값인 z-score 변환 수치값을 복수의 항목에 포함할 수 있다.Referring to Fig. 7B, the machining data table 820 includes numerical values converted into z-scores for a plurality of items. In this case, the values converted into z-scores are calculated from the mean and standard deviation of numerical values of different units. That is, the machining data table 820 may include z-score converted numerical values, which are the same values as numerical values of different units corresponding to a plurality of items, as one unit, in a plurality of items.

이에 따라, 질환 발병 확률 예측 장치 (400) 는 각각 다른 단위의 복수의 항목을 z-score로 변환함으로써, 복수의 항목에 동일한 기준값을 적용하여 질환 발병 확률에 영향을 주는 항목을 보다 용이하게 인식할 수 있도록 한다.Accordingly, the disease occurrence probability predicting device 400 converts a plurality of items of different units into z-scores, thereby applying the same reference value to a plurality of items to more easily recognize items that affect the disease occurrence probability .

도 8a 내지 도 8b는 본 발명의 일 실시예에 따라 복수의 항목의 값을 정의된 단위로 변환하여 입력한 가공 데이터 테이블을 도시한 것이다.8A and 8B illustrate a processed data table in which values of a plurality of items are converted into defined units according to an embodiment of the present invention.

도 8a를 참조하면, 원본 데이터 테이블 (910) 은 개인 일련 번호에 따른 복수의 이벤트를 포함한다. 이 때, 복수의 이벤트는 키, 몸무게, 현재 흡연 기간, 현재 하루 평균 흡연량, 1회 음주량인 복수의 항목을 포함한다. 이 때, 하나의 항목에 대응하는 수치값은 각각 다른 단위로 입력될 수 있다. 예를 들어, 키는 cm, ft, 몸무게는 kg, lb, 현재 흡연 기간은 5년 단위, 1년 단위, 현재 하루 평균 흡연량은 반갑 단위, 개피 단위, 1회 음주량은 소주 반병 단위, 소주잔 단위로 입력될 수 있다.Referring to FIG. 8A, the original data table 910 includes a plurality of events according to a personal serial number. At this time, the plurality of events includes a plurality of items of the key, the weight, the current smoking period, the average daily smoking amount, and the once drinking amount. At this time, numerical values corresponding to one item can be input in different units. For example, the key is cm, ft, the weight is kg, lb, the current smoking period is 5 years, the year is 1 year, the current average daily smoking amount is half a unit, Can be input.

도 8b를 참조하면, 가공 데이터 테이블 (920) 은 하나의 항목에 동일한 단위의 수치값을 포함한다. 예를 들어, 가공 데이터 테이블 (920) 은 cm인 키, kg인 몸무게, 1년 단위의 현재 흡연 기간, 개피 단위인 현재 하루 평균 흡연량, 소주잔 단위인 1회 음주량인 항목에 해당하는 수치값을 포함한다.Referring to FIG. 8B, the machining data table 920 includes numerical values of the same unit in one item. For example, the machining data table 920 includes numerical values corresponding to the items of a key in cm, a weight in kg, a current smoking period in a year, a current average daily smoking amount in a unit of cow, do.

이에 따라, 질환 발병 확률 예측 장치 (400) 는 하나의 항목에 각각 다른 단위의 수치값을 동일한 단위의 수치값으로 생성함으로써, 질환 발병 예측 모델이 각각 다른 단위의 수치값으로 구성되었던 원본 데이터도 입력받을 수 있어 보다 다양한 데이터를 기초로 정확도가 높은 질환 발병 확률을 산출할 수 있도록 한다.Accordingly, the disease occurrence probability predicting device 400 generates numerical values of different units in a single item with numerical values of the same unit, so that original disease data whose disease occurrence predicting model is composed of numerical values of different units are also input So that it is possible to calculate the probability of disease occurrence with high accuracy based on a variety of data.

도 9는 본 발명의 일 실시예에 따라 질병 발병 확률을 제공하는 화면을 도시한 것이다.FIG. 9 illustrates a screen for providing a disease incidence probability according to an embodiment of the present invention.

도 9를 참조하면, 질환 발병 확률 제공 화면 (1100) 은 년도별 질환 발병 확률 항목 (1110), 질환 발병 확률 항목 (1120) 및 현재 사용자의 위치 항목 (1130) 을 포함할 수 있다.Referring to FIG. 9, the disease occurrence probability providing screen 1100 may include a disease occurrence probability item 1110, a disease occurrence probability item 1120, and a current user location item 1130.

구체적으로, 질병 발병 확률 제공 화면 (1100) 은 시계열적으로 분류한 과거 건강 검진 데이터, 과거 문진 항목 데이터 및 과거 진료 기록 데이터를 기초로 산출된 년도별 질환 발병 확률 항목 (1110) 을 제공한다. 예를 들어, 질병 발병 확률 제공 화면 (1100) 은 과거에 해당하는 2015년, 현재에 해당하는 2016년, 미래에 해당하는 2017년의 질환 발병 확률을 제공할 수 있다. 또한, 질환 발병 확률 제공 화면 (1100) 은 질환의 종류에 따른 질환 발병 확률 즉, 질환 발병 확률 항목 (1120) 을 제공한다. 예를 들어, 질환 발병 확률 제공 화면 (1100) 은 고혈압, 협심증 및 동맥 경화증 등의 심혈관 질환 발병 확률, 위암, 대장암, 간암 등의 암 질환 발병 확률, 치매 질환 발병 확률 및 당뇨 질환 발병 확률이 각각 몇 퍼센트인지 제공할 수 있다. 또한, 질환 발병 확률 제공 화면 (1100) 은 산출한 질환 발병 확률에 따라 현재 사용자가 인구에 대비하여 질환이 발병할 확률이 몇 등에 속하는지, 백분위는 몇 퍼센트인지, 현재 사용자의 건강 상태를 기초로 환산한 점수는 몇 점인지에 대한 현재 사용자의 위치 항목 (1130) 을 제공할 수 있다. 예를 들어, 질환 발병 확률 제공 화면 (1100) 은 현재 사용자의 위치에 대해 질환 발병 확률을 계산한 총 인구 238만명 중 190만등, 80% 및 90점에 해당한다고 제공할 수 있다. 더 나아가, 질환 발병 확률 제공 화면 (1100) 은 질환 발병 확률에 따른 년도별 사용자의 위치를 제공할 수도 있다.Specifically, the disease occurrence probability providing screen 1100 provides a disease occurrence probability item 1110 for each year calculated based on past health examination data, past medical history item data, and past medical history data classified by time series. For example, the probability of disease outbreak screen 1100 may provide a probability of disease outbreak in 2017 corresponding to the past, 2016 corresponding to the present, and 2017 corresponding to the present. In addition, the disease occurrence probability providing screen 1100 provides a disease occurrence probability, that is, a disease occurrence probability item 1120, depending on the type of disease. For example, the probability of disease occurrence screen 1100 can be used to determine the probability of occurrence of cardiovascular diseases such as hypertension, angina pectoris and arteriosclerosis, the probability of developing cancer diseases such as gastric cancer, colorectal cancer and liver cancer, You can provide a percentage. In addition, the disease occurrence probability providing screen 1100 displays the probability that the current user belongs to the population based on the calculated disease occurrence probability, how many percent the disease occurrence probability, the percentile is the percentage, and the health status of the current user The converted score may provide a current user's location item 1130 for how many points there are. For example, the disease occurrence probability providing screen 1100 can provide 190 million, 80%, and 90 points out of the total population of 2.38 million, which calculates the probability of disease occurrence for the current user's location. Furthermore, the disease occurrence probability providing screen 1100 may provide the location of the user by year according to the disease occurrence probability.

이에 따라, 발병 예측 서버 (200) 는 사용자의 질환 발병 확률을 년도별, 심혈관 질환, 암, 치매, 당뇨 등의 질환 종류별로 제공하고, 질환 발병 확률에 따른 사용자의 위치를 제공함으로써, 보다 상세한 질환 발병 정보를 인식할 수 있도록 하고, 보험사와 의료 기관이 보다 용이하게 건강 소견을 작성할 수 있도록 한다. Accordingly, the onset prediction server 200 provides the user's disease occurrence probabilities for each disease type such as year, cardiovascular disease, cancer, dementia, and diabetes, and provides the position of the user according to the disease occurrence probability, To be able to recognize the onset information and to make it easier for insurers and medical institutions to make health findings.

도 10a 내지 도 10b는 건강 소견 및 보험 가입 적합성을 제공하는 화면을 도시한 것이다.FIGS. 10A and 10B show a screen for providing health findings and suitability for insurance.

도 10a를 참조하면, 건강 소견 제공 화면 (1200) 은 질환별 발병 확률 항목 (1210) 및 건강 소견 항목 (1220) 을 포함할 수 있다. Referring to FIG. 10A, the health finding providing screen 1200 may include a disease occurrence probability item 1210 and a health finding item 1220.

구체적으로, 건강 소견 제공 화면 (1200) 은 고혈압, 동맥 경화증, 뇌졸증, 뇌혈관 질환 등 각각의 질환에 따른 발병 확률인 질환별 발병 확률 항목 (1210) 을 제공한다. 예를 들어, 건강 소견 제공 화면 (1200) 은 고혈압이 발병할 확률이 70%, 협심증이 발병할 확률이 50%, 동맥 경화증이 발병할 확률이 80%, 위암이 발병할 확률이 20%, 대장암이 발병할 확률이 15%, 간암이 발병할 확률이 10%, 치매가 발병할 확률이 30%, 당뇨가 발병할 확률이 50%라는 것을 제공할 수 있다. 또한, 건강 소견 제공 화면 (1200) 은 질환 발병 확률을 높이는 요소들에 대해서도 제공할 수 있다. 예를 들어, 건강 소견 제공 화면 (1200) 은 혈압, 체지방, HDL 콜레스테롤 및 LDL 콜레스테롤에 대한 항목과 각각의 항목에 대한 수치값을 제공할 수 있다. 이 때, 질환 발병 확률에 영향을 준 정도에 따라 질환 발병 확률을 높이는 요소들에는 각각 다른 시각적 효과가 제공될 수 있다. 즉, 건강 소견 제공 화면 (1200) 은 질환 발병 확률을 높이는 요소들에 왼쪽 방향의 사선 표시, 질환 발병 확률에 평균적인 영향을 미치는 요소들에 오른쪽 방향의 사선 표시 및 질환 발병 확률에 적은 영향을 미치는 요소들에 복수의 점 표시 등을 제공할 수 있다. 또한, 건강 소견 제공 화면 (1200) 은 질환별 발병 확률 항목 (1210) 을 기초로 결정된 건강 소견 항목 (1220) 을 제공한다. 건강 소견은 질환을 발병시키는 요인과 질환별 발병 확률을 참조하여 작성된 코멘트이다. 이 때, 건강 소견은 자연어 처리됨에 따라 건강 소견 제공 화면 (1200) 은 자연어 처리되어 결정된 사용자의 건강 상태 대한 판단도 제공할 수 있다. 즉, 건강 소견 제공 화면 (1200) 은 건강 소견이 긍정적인 내용인지 부정적인 내용인지에 대한 여부를 제공할 수도 있다. 또한, 건강 소견 제공 화면 (1200) 은 건강 소견을 발병 예측 서버 (200) 로 전송하는 보내기 버튼 (1230) 을 제공한다. 따라서, 보내기 버튼 (1230) 에 대한 선택 신호를 수신한 경우, 건강 소견은 발병 예측 서버 (200) 로 전송된다.Specifically, the health finding providing screen 1200 provides a disease occurrence probability item 1210, which is a probability of occurrence according to each disease such as hypertension, arteriosclerosis, stroke, cerebrovascular disease, and the like. For example, the screen 1200 for providing health information includes a probability of developing hypertension of 70%, a probability of developing angina pectoris of 50%, a probability of developing atherosclerosis of 80%, a probability of developing gastric cancer of 20% The probability of developing cancer is 15%, the probability of developing liver cancer is 10%, the probability of developing dementia is 30%, and the probability of developing diabetes is 50%. In addition, the health finding providing screen 1200 can also provide elements for increasing the probability of disease occurrence. For example, the health finding providing screen 1200 may provide items for blood pressure, body fat, HDL cholesterol and LDL cholesterol, and numerical values for each item. At this time, different visual effects can be provided for the factors that increase the probability of the disease depending on the degree of influence on the probability of disease onset. In other words, the health finding providing screen 1200 displays a slanting line to the left in the factors that increase the probability of disease occurrence, a slanting line in the right direction on the factors having an average influence on the disease occurrence probability, A plurality of dot displays and the like can be provided to the elements. In addition, the health finding providing screen 1200 provides the health finding item 1220 determined based on the disease occurrence probability item 1210. Health findings are comments made by referring to the factors causing the disease and the probability of disease occurrence by disease. At this time, since the health findings are processed in a natural language, the health finding providing screen 1200 may be processed in a natural language to provide a judgment on the health status of the determined user. That is, the health finding providing screen 1200 may provide whether the health findings are positive or negative. In addition, the health finding providing screen 1200 provides a send button 1230 for transmitting health findings to the onset prediction server 200. Accordingly, when receiving the selection signal for the send button 1230, the health findings are transmitted to the onset prediction server 200.

도 10b를 참조하면, 보험 가입 적합성 제공 화면 (1200) 은 질환별 발병 확률 항목 (1210) 및 보험 가입 적합성 항목 (1240) 을 포함할 수 있다. 구체적인 질환별 발병 확률 항목 (1210) 을 포함한 보험 가입 적합성 제공 화면은 도 6a를 참조하여 설명한 내용과 동일하므로 설명은 생략한다.Referring to FIG. 10B, the insurance adequacy providing screen 1200 may include a disease occurrence probability item 1210 and an insurance compliance item 1240. The insurance coverage suitability screen including the specific disease risk probability item 1210 is the same as that described with reference to FIG. 6A, and therefore, description thereof is omitted.

구체적으로, 보험 가입 적합성 제공 화면 (1200) 은 발병 예측 확률 서버 (200) 에서 건강 소견을 기초로 결정된 보험 가입 적합성 항목 (1240) 을 제공한다. 보험 가입 적합성 항목 (1240) 은 결정된 질환 발병 확률에 따라 작성된 건강 소견을 기초로 사용자가 보험 가입이 적합한지 여부에 대한 내용을 포함하는 코멘트이다. 더 나아가, 보험 가입 적합성 제공 화면 (1200) 은 보험 가입 적합성에 대해 수치화한 점수도 제공할 수 있다.Specifically, the insurance adequacy providing screen 1200 provides an insurance compliance item 1240 determined based on health findings in the onset prediction probability server 200. The compliance eligibility item (1240) is a comment that includes whether the user is eligible to insure based on the health findings generated according to the determined disease outbreak probability. Furthermore, the insurance coverage suitability screen 1200 can also provide a numerical score for the adequacy of the insurance coverage.

이에 따라, 발병 예측 서버 (200) 는 질환별 발병 확률 뿐만 아니라 질환을 발병시키는 요인에 따른 질환 발병 확률을 제공함으로써, 사용자가 어떠한 질환에 대해 발병 확률이 높은지, 어떠한 요인이 질환을 발병시키고 확률은 얼마나 되는지에 대한 구체적인 질병 확률을 인식할 수 있도록 한다. 또한, 발병 예측 서버 (200) 는 건강 소견을 기초로 보험 가입 적합성을 제공함으로써, 보험사가 사용자의 보험 가입이 적합한지에 대해 객관적으로 판단하여 보험 가입에 따른 수익성을 보다 용이하게 계산할 수 있도록 한다. Accordingly, the onset prediction server 200 provides a probability of disease occurrence according to a disease causing factor as well as a probability of occurrence of a disease, so that it is possible to determine the probability that a user will have an onset of a disease, To recognize the specific disease probability of how much it is. In addition, the onset prediction server 200 provides insurance suitability based on health findings so that the insurer can objectively determine whether or not the insurer's insurance is appropriate, thereby making it easier to calculate the profitability of the insured.

본 명세서에서, 각 블록 또는 각 단계는 특정된 논리적 기능 (들) 을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또한, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.In this specification, each block or each step may represent a part of a module, segment or code that includes one or more executable instructions for executing the specified logical function (s). It should also be noted that in some alternative embodiments, the functions mentioned in the blocks or steps may occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order according to the corresponding function.

본 명세서에 개시된 실시예들과 관련하여 설명된 방법 또는 알고리즘의 단계는 프로세서에 의해 실행되는 하드웨어, 소프트웨어 모듈 또는 그 2 개의 결합으로 직접 구현될 수도 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈형 디스크, CD-ROM 또는 당업계에 알려진 임의의 다른 형태의 저장 매체에 상주할 수도 있다. 예시적인 저장 매체는 프로세서에 커플링되며, 그 프로세서는 저장 매체로부터 정보를 판독할 수 있고 저장 매체에 정보를 기입할 수 있다. 다른 방법으로, 저장 매체는 프로세서와 일체형일 수도 있다. 프로세서 및 저장 매체는 주문형 집적회로 (ASIC) 내에 상주할 수도 있다. ASIC는 사용자 단말기 내에 상주할 수도 있다. 다른 방법으로, 프로세서 및 저장 매체는 사용자 단말기 내에 개별 컴포넌트로서 상주할 수도 있다.The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software module may reside in a RAM memory, a flash memory, a ROM memory, an EPROM memory, an EEPROM memory, a register, a hard disk, a removable disk, a CD-ROM or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, which is capable of reading information from, and writing information to, the storage medium. Alternatively, the storage medium may be integral with the processor. The processor and the storage medium may reside within an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal. Alternatively, the processor and the storage medium may reside as discrete components in a user terminal.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형실시될 수 있다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, it is to be understood that the present invention is not limited to those embodiments and various changes and modifications may be made without departing from the scope of the present invention. . Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. Therefore, it should be understood that the above-described embodiments are illustrative in all aspects and not restrictive. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

100: 가공 데이터
200: 질환 발병 예측 모델
300: 질환 발병 확률
400: 질환 발병 확률 예측 장치
410: 통신부
420: 프로세서
430: 저장부
510, 610, 710, 810, 910: 원본 데이터 테이블
511, 512: 진료 일자
520, 620, 720, 820, 920: 가공 데이터 테이블
521, 522: 복용 약품 분류 코드
531, 532: 복용 약품 투약량
611, 612, 613: 이벤트
621: 결측 이벤트
711, 721: 결측 데이터
1000: 질환 발병 확률 제공 시스템
1100: 질환 발병 확률 제공 화면
1110: 년도별 질환 발병 확률 항목
1120: 질환 발병 확률 항목
1130: 현재 사용자의 위치 항목
1200: 건강 소견 제공 화면
1210: 질환별 발병 확률 항목
1220: 건강 소견 항목
1230: 보내기 버튼
1240: 보험 가입 적합성 항목100: Process data
200: Predictive model of disease outbreak
300: probability of disease occurrence
400: Probability of disease occurrence prediction device
410:
420: processor
430:
510, 610, 710, 810, 910: original data table
511, 512: Date of medical examination
520, 620, 720, 820, 920:
521, 522: Drug classification code
531, 532: Dosage of medication
611, 612, 613: Event
621: Missing event
711, 721: Missing data
1000: Probability of disease occurrence providing system
1100: Probability of disease occurrence screen
1110: Probability of disease occurrence by year
1120: probability of disease occurrence item
1130: Current user's location entry
1200: Healthy Providing Screen
1210: Occurrence probability item by disease
1220: Healthy items
1230: Send button
1240: Insurance Eligibility Items

Claims

Receiving original data including a plurality of items from at least one external database;
Generating processing data indicating one medical examination or one medical examination as one event according to a predetermined criterion based on the original data;
Inputting the processed data to a disease onset prediction model; And
And calculating a disease incidence probability for at least one disease using the disease incidence prediction model.

The method according to claim 1,
Wherein the disease is at least one of cardiovascular disease, stomach cancer, liver cancer, colon cancer, lung cancer, breast cancer, prostate cancer, dementia or diabetes, and the disease onset prediction model is separately constructed for each of the diseases.

The method according to claim 1,
Wherein the step of receiving the original data comprises:
Sociological data, at least one of the medical care record data including the medical care, and at least one of health examination data including at least the one health care examination.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
When a plurality of the original data exists for one medical care day,
Further comprising integrating the original data into one event for the one care date.

The method according to claim 1,
Wherein the one event includes data on an administration drug classification code and a dosage amount.

The method according to claim 1,
Further comprising the step of filtering an item associated with the disease out of the plurality of items.

The method according to claim 6,
The method of predicting the onset of a disease, wherein the item associated with the onset of the disease is at least 50 individuals.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
Determining whether a missing event is present in the event;
If the missing event is present,
Generating at least one of a representative value, an average value, or an interpolation value for the missing event; And
And inputting at least one of the representative value, the average value, or the interpolation value to the missing event.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
Determining whether there is data missing in the plurality of items included in the event;
If the missing data is present,
Generating at least one of a representative value, an average value, or an interpolation value for the missing data; And
And inputting at least one of the representative value, the average value, or the interpolation value to the missing data.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
Calculating a distribution based on the frequency of the length for the event; And
And generating the processed data so as to include only events corresponding to a predetermined threshold value in the distribution,
Wherein the threshold value is a threshold value,
Wherein the length of the event is located in a 95% region from left to right with respect to the center of the distribution.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
Calculating an average and a standard deviation of data of the plurality of items included in the event;
Converting the data of the plurality of items into a z-score using the average and standard deviation; And
And inputting the z-score to the data of the plurality of items.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
Extracting each unit corresponding to the plurality of items; And
And converting each of the units into units defined in the processed data.

The method according to claim 1,
Wherein the step of generating the processed data comprises:
And generating the processed data so as to include only a part of data of the plurality of items of data.

The method according to claim 1,
The step of calculating the disease occurrence probability comprises:
And calculating at least one of a probability of the disease occurring or a probability of occurrence of the disease according to the type of the disease.

The method according to claim 1,
Further comprising the step of calculating a body age or an expected life span using the disease onset prediction model.

A communication unit configured to receive original data including a plurality of items from at least one external database;
A processor configured to generate processing data indicating one medical examination or one medical examination as one event according to a predetermined criterion based on the original data; And
And a storage unit for storing the original data and the processed data,
The processor comprising:
Inputting the processed data into a disease onset prediction model,
Wherein the probability of disease occurrence for at least one disease is calculated using the disease occurrence prediction model.

17. The method of claim 16,
Wherein,
Sociological data, medical record data including at least the one medical examination, and health examination data including at least the one medical examination.

17. The method of claim 16,
The processor comprising:
Determines whether there is a missing event in the event,
If the missing event is present,
Generating at least one of a representative value, an average value, and an interpolation value for the missing event,
And to input at least one of the representative value, the average value, or the interpolation value to the missing event.

17. The method of claim 16,
The processor comprising:
Determining whether there is data missing in the plurality of items included in the event,
If the missing data is present,
Generating at least one of a representative value, an average value, and an interpolation value for the missing data,
And to input at least one of the representative value, the average value, or the interpolation value to the missing data.

17. The method of claim 16,
The processor comprising:
Calculating a distribution based on the frequency of the length of the event,
And generate the processed data so as to include only events corresponding to a predetermined threshold value in the distribution,
Wherein the threshold value is a threshold value,
Wherein the length of the event is located in the 95% region from left to right with respect to the center of the distribution.

17. The method of claim 16,
The processor comprising:
Calculating an average and a standard deviation of data of the plurality of items included in the event,
Converting the data of the plurality of items into a z-score using the average and standard deviation,
And inputting the z-score to the data of the plurality of items.

17. The method of claim 16,
The processor comprising:
Extracting each unit corresponding to the plurality of items,
And convert each of the units into a unit defined in the processed data.