KR20190069047A

KR20190069047A - Apparatus and method for predicting disease

Info

Publication number: KR20190069047A
Application number: KR1020170169399A
Authority: KR
Inventors: 김의직; 이솔비; 권정혁
Original assignee: 한림대학교 산학협력단
Priority date: 2017-12-11
Filing date: 2017-12-11
Publication date: 2019-06-19
Also published as: KR102030435B1

Abstract

The present invention relates to a method for predicting a disease which comprises the steps of: receiving health data of a target to be predicted; generating a disease prediction model by using electronic health records (EHR) and integrated data set generated based on a mobile personal health records (mPHR) obtained through a mobile apparatus; and predicting a disease possibility for the target to be predicted corresponding to the health data based on the disease prediction model.

Description

[0001] APPARATUS AND METHOD FOR PREDICTING DISEASE [0002]

본원은 질환 예측 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for predicting a disease.

최근 하드 카피 형식으로 저장된 많은 양의 의료 데이터가 의료 산업에서 빠르게 디지털화되어 축적되고 있다. 또한, 사물 인터넷(lnternet of Things, IoT)을 기반으로 하는 의료 기기 및 플랫폼의 급속한 개발과 함께 모바일 기기의 보급이 증가함에 따라 개인 사용자를 위한 의료 빅데이터 수집이 가능하게 되었다. 많은 조직과 병원은 디지털화된 전자건강기록(electronic health records, EHR)의 방대한 수집으로부터 가치 있는 정보를 얻을 수 있다. 최근 몇 년 동안 의미 있는 정보를 얻기 위해 EHR을 이용한 다양한 의학적 연구가 수행되었다. 즉, 의료 빅데이터 분석의 중요성에 대한 인식이 확산되고 있으며, 이에 따라 데이터 마이닝과 같은 기술을 사용하여 방대한 양의 데이터에서 의미 있는 정보를 효과적으로 찾아야 할 필요가 있다.Recently, a large amount of medical data stored in hard copy format has been rapidly digitized and accumulated in the medical industry. In addition, with the rapid development of medical devices and platforms based on Internet of Things (IoT), and with the spread of mobile devices, medical big data collection for individual users has become possible. Many organizations and hospitals can obtain valuable information from the vast collection of digitized electronic health records (EHR). In recent years, a variety of medical studies using EHR have been performed to obtain meaningful information. In other words, there is a growing awareness of the importance of medical big data analysis, and it is therefore necessary to find meaningful information effectively from vast amounts of data using techniques such as data mining.

대부분의 의료기관에서 간호의 질과 환자의 건강을 향상시키기 위해 사용되는 EHR은 전자적으로 기록된 환자의 전반적인 건강 관리와 관련된 건강 정보(데이터)로 이루어진다. 이러한 EHR과 같은 의료용 빅데이터(big data)는 다양한 데이터 유형과 고차원 데이터의 존재로 인해 기존의 방법으로 분석하기에 어려운 문제가 있다. 더욱이, EHR은 후술할 mPHR에 비해 상대적으로 정적이기 때문에 EHR을 사용하는 것만으로는 환자의 현재 건강 상태를 파악하기 어려운 단점이 있다. 그럼에도 불구하고, EHR에는 의료 전문가가 제공한 진료 및 처방 정보가 포함되어 있기 때문에, 환자로 하여금 의료 제공자가 보다 높은 수준의 의료 서비스를 제공할 수 있도록 한다는 점에서 큰 장점이 있다.The EHR used to improve the quality of care and patient health in most medical institutions consists of health information (data) related to the overall health care of the electronically recorded patient. Medical big data such as EHR are difficult to analyze by existing methods due to existence of various data types and high dimensional data. Furthermore, since the EHR is relatively static compared to the mPHR described later, it is difficult to grasp the current health state of the patient only by using the EHR. Nevertheless, the EHR has a great advantage in that it allows the patient to provide a higher level of health care services, since the EHR contains the care and prescription information provided by the healthcare professional.

한편, 모바일 기반 개인건강기록(mobile personal health records, mPHR)은 모바일 의료 기기를 통해 수집되기 때문에 데이터의 정확성과 신뢰성을 확인하기 어려운 문제가 있다. 이는 mPHR 데이터가 의료 전문가에 의해 수집되거나 병원의 의료 기기를 사용하여 수집되지 않기 때문이라 할 수 있다. 이러한 mPHR은 질병과 관련된 심층 분석에 사용하기 어려운 측면이 있다. 그러나, mPHR의 경우에는 환자의 상태가 EHR에 비해 상대적으로 짧은 기간에 측정되어 업데이트됨에 따라 환자의 현재 상태가 보다 정확하게 반영된다는 점에서 큰 장점이 있다. 종래에는 이러한 EHR 및 mPHR이 갖는 한계를 고려하여 향상된 의료 서비스를 제공하기 위해 EHR 및 mPHR과 관련된 광범위한 연구가 다양한 측면에서 수행된 바 있다.On the other hand, since mobile personal health records (mPHRs) are collected through mobile medical devices, it is difficult to confirm the accuracy and reliability of data. This is because the mPHR data is not collected by a medical professional or collected using a hospital medical device. These mPHRs are difficult to use for in-depth analysis related to disease. However, in the case of mPHR, there is a great merit in that the state of the patient is measured and updated in a relatively short period of time as compared with the EHR, so that the present state of the patient is more accurately reflected. Conventionally, extensive studies related to EHR and mPHR have been performed in various aspects in order to provide improved medical services considering the limitations of EHR and mPHR.

일예로, 논문 ["A hybrid outlier detection method for health care big data", Ke Yan, Xiaoming You, Xiaobo Ji, Guangqiang Yin, Fan Yang, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom) (2016), pp: 157-162]에서는 프루닝(pruning) 기반의 K-Nearest Neighbor(PB-KNN)이라 불리는 새로운 hybrid outlier detection 방법을 제안했다. 이 방법은 밀도 기반, 클러스터 기반 방법 및 KNN 알고리즘을 통합함으로써 많은 양의 데이터, 다양한 데이터 유형 및 고차원의 데이터가 포함되어 있는 의료 분야에서의 데이터 분석의 어려움을 극복한다. 그러나 선행 논문에서 제안하는 방법은 EHR만을 사용하기 때문에 사용자의 현재 상태에 대한 반영이 어려워 분석에 한계가 있다.For example, in a paper entitled "A Hybrid Outlier Detection Method for Healthcare Big Data," Ke Yan, Xiaoming You, Xiaobo Ji, Guangqiang Yin, Fan Yang, 2016 IEEE International Conference on Big Data and Cloud Computing (BDCloud) A new hybrid called pruning-based K-Nearest Neighbor (PB-KNN) is proposed in [2], Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (2016) outlier detection method. This approach overcomes the difficulties of data analysis in the medical field, which involves large amounts of data, diverse data types, and high-dimensional data by integrating density-based, cluster-based methods and KNN algorithms. However, since the proposed method uses only EHR, it is difficult to reflect the current state of users.

또한, 논문 ["Outlier detection for patient monitoring and alerting", Milos Hauskrecht, Iyad Batal, Michal Valko, Shyam Visweswaran, Gregory F Cooper, Gilles Clermont, Journal of Biomedical Informatics, 46권 1호, (2013) pp: 47-55]에서는 새로운 데이터 기반 모니터링 및 경고 프레임워크에 대하여 제안했다. 상기 논문에서는 EHR에 저장된 과거 환자 사례를 사용하여 의료 이상치 정보를 검출하는 것을 목적으로 한다. 그런데, 상기 논문의 기술에서는 이전에 기록된 EHR만을 활용하기 때문에 환자의 실제 상태(actual status)를 확인하는 데에 어려움이 있다. 즉, 상기의 논문 또한 사용자의 현재 상태에 대한 반영이 어려워 분석에 한계가 있다.Giles Clermont, Journal of Biomedical Informatics, Volume 46, Issue 1 (2013) pp. 47-47. [CrossRef], [Web of Science ®] 55] proposed a new data-based monitoring and alerting framework. The purpose of this paper is to detect medical outlier information using past patient cases stored in EHR. However, since the above-described technique utilizes only the previously recorded EHR, it is difficult to confirm the actual status of the patient. In other words, the above paper also has limitations in the analysis because it is difficult to reflect on the current state of the user.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 의료 서비스를 제공함에 있어서 EHR 활용 시 환자의 현재의 상태를 확인하는데 어려웠던 문제와 mPHR 활용 시 데이터의 정확성과 신뢰성을 확인하기 어려웠던 문제를 해소할 수 있는 EHR과 mPHR이 통합된 통합 데이터셋을 제공하려는 것을 목적으로 한다.It is an object of the present invention to solve the problems of the prior art described above and to solve the problem that is difficult to confirm the current state of the patient when using the EHR in providing the medical service and the problem that the accuracy and reliability of the data can not be confirmed when using the mPHR The goal is to provide an integrated data set with integrated EHR and mPHR.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 사용자의 현재 상태를 반영하면서 보다 정확하고 신뢰성 있는 건강 관련 진단 내지 분석이 이루어질 수 있도록 하는 통합 데이터셋 생성 장치 및 방법과 질환 예측 장치 및 방법을 제공하려는 것을 목적으로 한다.The present invention has been made to solve the above-mentioned problems of the prior art, and it is an object of the present invention to provide an integrated data set generating apparatus and method and a disease predicting apparatus and method for enabling more accurate and reliable health related diagnosis or analysis, The purpose is to provide.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.It is to be understood, however, that the technical scope of the embodiments of the present invention is not limited to the above-described technical problems, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제1 측면에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 방법은, 전자건강기록(electronic health records, EHR)을 포함하는 제1 데이터셋 및 모바일 기기를 통해 획득되는 모바일 기반 개인건강기록(mobile personal health records, mPHR)을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하는 단계; 및 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 상기 전처리된 제1 데이터셋의 데이터 및 상기 전처리된 제2 데이터셋의 데이터가 통합된 통합 데이터셋을 생성하는 단계를 포함할 수 있다.According to a first aspect of the present invention, there is provided a method of generating a data set for providing a healthcare service, the method including generating a first data set including electronic health records (EHR) Performing preprocessing on data contained in each second data set including mobile personal health records (mPHR) obtained via the device; And generating an integrated data set in which the data of the preprocessed first data set and the data of the preprocessed second data set are integrated through matching between the data of the preprocessed first data set and the data of the preprocessed second data set Step < / RTI >

또한, 상기 전처리를 수행하는 단계는, 상기 제1 데이터셋 및 상기 제2 데이터셋 각각에 포함된 데이터에 대하여, 누락 값 또는 이상치가 존재하는 것으로 판단되는 경우, 상기 누락 값 또는 이상치가 속한 데이터셋 내에서 상기 누락 값 또는 이상치에 대응하는 속성과 동일 속성에 속하는 속성 값들의 평균값으로 대체하는 전처리를 수행할 수 있다.The pre-processing may further include, when it is determined that a missing value or an ideal value exists in the data included in each of the first data set and the second data set, By replacing the attribute corresponding to the missing value or the outliers with the average value of attribute values belonging to the same attribute.

또한, 상기 생성하는 단계는, 상기 제1 데이터셋의 데이터 속성과 상기 제2 데이터셋의 데이터 속성 간의 동일 유무를 고려한 상기 매칭을 통해 상기 통합 데이터셋을 생성할 수 있다.In addition, the generating step may generate the integrated data set through the matching in consideration of whether or not the data attribute of the first data set is the same as the data attribute of the second data set.

또한, 상기 생성하는 단계는, 상기 제1 데이터셋의 데이터 속성과 상기 제2 데이터셋의 데이터 속성이 동일하지 않는 것으로 판단되는 경우, 상기 제2 데이터셋의 데이터 속성을 상기 제1 데이터셋의 데이터 속성과 결합시키고, 상기 제1 데이터셋의 데이터 속성과 상기 제2 데이터셋의 데이터 속성이 동일한 것으로 판단되는 경우, 상기 제1 데이터셋의 데이터 속성을 상기 제2 데이터셋의 데이터 속성으로 덮어씌울 수 있다.If the data attribute of the first data set is not equal to the data attribute of the second data set, the generating of the data attribute of the second data set may be performed on the data of the first data set And if the data attribute of the first data set is determined to be the same as the data attribute of the second data set, the data attribute of the first data set may be overwritten with the data attribute of the second data set have.

또한, 본원의 제1 측면에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 방법은, 상기 전처리를 수행하는 단계 이전에, 상기 제1 데이터셋 및 상기 제2 데이터셋을 생성하는 단계를 더 포함할 수 있다.The method of generating a data set for providing a healthcare service according to the first aspect of the present invention may further include generating the first data set and the second data set prior to the step of performing the preprocessing have.

또한, 상기 제1 데이터셋 및 제2 데이터셋을 생성하는 단계는, UCI(University of California-Irvine) 기계 학습 저장소(Machine Learning Repository)에서 제공하는 데이터로서 서로 다른 장소를 갖는 복수의 데이터셋에 속한 데이터를 통합하여 상기 제1 데이터셋을 생성하고, 모바일 기기를 통해 획득되는 데이터의 속성이 상기 제1 데이터셋의 속성과 매치되도록 상기 제2 데이터셋을 생성할 수 있다.In addition, the step of generating the first data set and the second data set may include generating a first data set and a second data set, the data being provided from a University of California-Irvine (UCI) Machine Learning Repository, Data may be integrated to generate the first data set, and the second data set may be generated such that an attribute of data acquired through the mobile device matches an attribute of the first data set.

본원의 제2 측면에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 장치는, 전자건강기록(electronic health records, EHR)을 포함하는 제1 데이터셋 및 모바일 기기를 통해 획득되는 개인건강기록(mobile personal health records, mPHR)을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하는 전처리부; 및 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 상기 전처리된 제1 데이터셋의 데이터 및 상기 전처리된 제2 데이터셋의 데이터가 통합된 통합 데이터셋을 생성하는 매칭부를 포함할 수 있다.According to a second aspect of the present invention, there is provided a data set generating apparatus for providing a healthcare service, comprising: a first data set including electronic health records (EHR); and a mobile personal health a preprocessing unit for performing preprocessing on data included in each of the second data sets including the first and second data sets; And generating an integrated data set in which the data of the preprocessed first data set and the data of the preprocessed second data set are integrated through matching between the data of the preprocessed first data set and the data of the preprocessed second data set And may include a matching unit.

본원의 제3 측면에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 시스템은, 복수의 사용자에 대한 전자건강기록(electronic health records, EHR)을 제공하는 복수의 의료기관의 장치; 복수의 사용자 각각에 대한 모바일 기반의 개인건강기록(mobile personal health records, mPHR)을 측정하여 제공하는 복수의 모바일 기기; 및 상기 전자건강기록을 포함하는 제1 데이터셋 및 상기 모바일 기기를 통해 획득되는 상기 개인건강기록을 포함하는 제2 데이터셋을 생성하여 상기 제1 데이터셋 및 상기 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하고, 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 상기 전처리된 제1 데이터셋의 데이터 및 상기 전처리된 제2 데이터셋의 데이터가 통합된 통합 데이터셋을 생성하는 데이터셋 생성 장치를 포함할 수 있다.According to a third aspect of the present invention, there is provided a data set generation system for providing a healthcare service, comprising: a plurality of medical institution's apparatuses providing electronic health records (EHR) for a plurality of users; A plurality of mobile devices measuring and providing mobile personal health records (mPHR) for each of a plurality of users; And generating a second data set comprising the first data set including the electronic health record and the personal health record acquired through the mobile device to generate data included in each of the first data set and the second data set And the data of the preprocessed first data set and the data of the preprocessed second data set are combined through the preprocessing of the data of the preprocessed first data set and the preprocessed second data set, And a data set generation device for generating an integrated data set.

본원의 제4 측면에 따른 컴퓨터 프로그램은, 본원의 제1 측면에 따른 데이터셋 생성 방법을 실행시키기 위하여 기록 매체에 저장되는 것일 수 있다.A computer program according to the fourth aspect of the present invention may be stored on a recording medium for executing the method of generating a dataset according to the first aspect of the present application.

본원의 제5 측면에 따른 질환 예측 방법은, 예측 대상자의 건강 데이터를 수신하는 단계; 전자건강기록(electronic health records, EHR) 및 모바일 기기를 통해 획득된 모바일 기반의 개인건강기록(mobile personal health records, mPHR)에 기초하여 생성된 통합 데이터셋을 이용하여 질환 예측 모델을 생성하는 단계; 및 상기 질환 예측 모델에 기초하여 상기 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측하는 단계를 포함할 수 있다.According to a fifth aspect of the present invention, there is provided a disease predicting method comprising: receiving health data of a subject to be predicted; Generating a disease prediction model using an integrated data set generated based on electronic health records (EHR) and mobile personal health records (mPHR) obtained via a mobile device; And predicting a disease potential for a predictive subject corresponding to the health data based on the disease predictive model.

또한, 상기 생성하는 단계는, 상기 건강 데이터가 수신된 이후에 상기 예측 대상자의 특성 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 상기 질환 예측 모델을 생성할 수 있다.Also, the generating step may generate the disease prediction model considering at least one of the characteristics of the predictive subject and the type of disease to be predicted after the health data is received.

또한, 상기 생성하는 단계는, 상기 통합 데이터셋에 포함된 데이터를 기반으로 복수 사용자의 특성이 고려된 복수의 질환 예측 모델을 생성하고, 상기 건강 데이터가 수신된 이후에 상기 예측 대상자의 특성 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 상기 복수의 질환 예측 모델 중 상기 예측 대상자에 최적화된 최적 질환 예측 모델을 선택하는 단계를 포함하고, 상기 예측하는 단계는, 상기 최적 질환 예측 모델에 기초하여 상기 예측 대상자에 대한 질환 가능성을 예측할 수 있다.Also, the generating step may include generating a plurality of disease prediction models in which characteristics of a plurality of users are taken into consideration based on data included in the integrated data set, and after the health data is received, And selecting an optimal disease prediction model optimized for the predicted subject among the plurality of disease prediction models considering at least one of the types of disease to be diagnosed based on the optimal disease prediction model, The possibility of disease for the predicted subject can be predicted.

또한, 상기 질환 예측 모델은, 상기 통합 데이터셋에 포함된 데이터의 복수의 속성과 관련된 복수의 규칙을 포함하고, 상기 복수의 규칙의 조합 수 및 조합 순서는 상기 예측 대상자의 특성 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 결정될 수 있다.In addition, the disease prediction model may include a plurality of rules related to a plurality of attributes of data included in the integrated data set, the number of combinations of the plurality of rules and the combination order may be determined based on characteristics of the predicted subject, Type, and the like.

또한, 상기 질환 예측 모델은, 상기 통합 데이터셋에 포함된 데이터에 대하여 의사결정트리(decision tree)를 적용함으로써 생성될 수 있다.In addition, the disease prediction model may be generated by applying a decision tree to the data included in the integrated data set.

또한, 상기 통합 데이터셋은, 상기 전자건강기록을 포함하는 제1 데이터셋 및 상기 모바일 기기를 통해 획득되는 개인건강기록을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하고, 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 생성될 수 있다.The integrated data set may also be configured to perform pre-processing on data contained in each of a first data set comprising the electronic health record and a second data set including a personal health record obtained through the mobile device, Can be generated by matching between the data of the first data set and the data of the preprocessed second data set.

본원의 제6 측면에 따른 질환 예측 장치는, 예측 대상자의 건강 데이터를 수신하는 수신부; 전자건강기록(electronic health records, EHR) 및 모바일 기기를 통해 획득된 개인건강기록(mobile personal health records, mPHR)에 기초하여 생성된 통합 데이터셋을 이용하여 질환 예측 모델을 생성하는 생성부; 및 상기 질환 예측 모델에 기초하여 상기 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측하는 예측부를 포함할 수 있다.According to a sixth aspect of the present invention, there is provided a disease predicting apparatus comprising: a receiving unit for receiving health data of a subject to be predicted; A generating unit for generating a disease prediction model using an integrated data set generated based on electronic health records (EHR) and mobile personal health records (mPHR) obtained through a mobile device; And a predictor for predicting a disease probability of a predictive subject corresponding to the health data based on the disease predictive model.

본원의 제7 측면에 따른 질환 예측 시스템은, 예측 대상자의 건강 데이터를 제공하는 제1 단말 기기; 및 상기 제1 단말 기기로부터 상기 건강 데이터를 수신하고, 전자건강기록(electronic health records, EHR) 및 모바일 기기를 통해 획득된 개인건강기록(mobile personal health records, mPHR)에 기초하여 생성된 통합 데이터셋을 이용하여 질환 예측 모델을 생성하고, 상기 질환 예측 모델에 기초하여 상기 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측하고, 예측 결과를 제공하는 질환 예측 장치를 포함할 수 있다.A disease prediction system according to a seventh aspect of the present invention comprises: a first terminal device for providing health data of a predicted subject; And means for receiving the health data from the first terminal device and generating an integrated data set based on electronic health records (EHR) and mobile personal health records (mPHR) And a disease predicting device for predicting a disease potential of a predictive subject corresponding to the health data based on the disease predictive model and providing a prediction result.

본원의 제8 측면에 따른 컴퓨터 프로그램은, 본원의 제5 측면에 따른 질환 예측 방법을 실행시키기 위하여 기록 매체에 저장되는 것일 수 있다.The computer program according to the eighth aspect of the present invention may be stored in a recording medium for executing the disease predicting method according to the fifth aspect of the present invention.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described task solution is merely exemplary and should not be construed as limiting the present disclosure. In addition to the exemplary embodiments described above, there may be additional embodiments in the drawings and the detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, EHR과 mPHR이 통합된 통합 데이터셋을 제공함으로써, 의료 서비스를 제공함에 있어서 EHR 활용시 환자의 현재의 상태를 확인하는데 어려웠던 문제와 mPHR 활용시 데이터의 정확성과 신뢰성을 확인하기 어려웠던 문제를 해소할 수 있다.According to the above-described task solution of the present invention, by providing an integrated data set in which the EHR and the mPHR are integrated, it is possible to solve the problem that is difficult to confirm the current state of the patient when using the EHR in providing the medical service, It is possible to solve the problem that is difficult to confirm reliability.

전술한 본원의 과제 해결 수단에 의하면, 통합 데이터셋 및 통합 데이터셋을 이용하여 생성된 질환 예측 모델로 하여금, 사용자의 현재 상태를 반영하면서 보다 정확하고 신뢰성 있는 의료 서비스의 제공(즉, 건강 관련 진단 내지 분석)이 이루어질 수 있도록 할 수 있다.According to the above-described task solution of the present invention, it is possible to provide a disease prediction model generated using an integrated data set and an integrated data set to provide a more accurate and reliable medical service (that is, Analysis) can be performed.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effects obtainable here are not limited to the effects as described above, and other effects may exist.

도 1은 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 시스템의 개략적인 구성을 나타낸 도면이다.
도 2는 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 장치에서 고려되는 EHR의 속성을 나타낸 도면이다.
도 3은 본원의 일 실시예에 따른 질환 예측 시스템의 개략적인 구성을 나타낸 도면이다.
도 4는 본원의 일 실시예에 따른 통합 데이터셋에 의사결정트리를 적용함에 따라 생성된 질환 예측 모델의 예를 나타낸 도면이다.
도 5는 종래의 EHR에 의사결정트리를 적용함에 따라 생성된 질환 예측 모델의 예를 나타낸 도면이다.
도 6은 종래의 mPHR에 의사결정트리를 적용함에 따라 생성된 질환 예측 모델의 예를 나타낸 도면이다.
도 7은 EHR 또는 mPHR 대비 본원의 일 실시예에 따른 통합 데이터셋 기반의 질환 예측 모델의 정확도를 나타낸 도면이다.
도 8는 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 방법에 대한 동작 흐름도이다.
도 9는 본원의 일 실시예에 따른 질환 예측 방법에 대한 동작 흐름도이다.FIG. 1 is a block diagram illustrating a data set generation system for providing a healthcare service according to an embodiment of the present invention. Referring to FIG.
2 is a diagram illustrating attributes of an EHR considered in a data set generating apparatus for providing a healthcare service according to an embodiment of the present invention.
3 is a diagram showing a schematic configuration of a disease prediction system according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating an example of a disease prediction model generated by applying a decision tree to an integrated data set according to an embodiment of the present invention. Referring to FIG.
5 is a diagram showing an example of a disease prediction model generated by applying a decision tree to a conventional EHR.
6 is a diagram illustrating an example of a disease prediction model generated by applying a decision tree to a conventional mPHR.
Figure 7 illustrates the accuracy of an integrated data set based disease prediction model in accordance with one embodiment of the present disclosure versus EHR or mPHR.
8 is a flowchart illustrating a method of generating a data set for providing a healthcare service according to an embodiment of the present invention.
9 is a flowchart illustrating an operation of the disease prediction method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. It should be understood, however, that the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, the same reference numbers are used throughout the specification to refer to the same or like parts.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when an element is referred to as being "connected" to another element, it is intended to be understood that it is not only "directly connected" but also "electrically connected" or "indirectly connected" "Is included.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.It will be appreciated that throughout the specification it will be understood that when a member is located on another member "top", "top", "under", "bottom" But also the case where there is another member between the two members as well as the case where they are in contact with each other.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when an element is referred to as "including " an element, it is understood that the element may include other elements as well, without departing from the other elements unless specifically stated otherwise.

본원은 전자건강기록(electronic health records, EHR)과 모바일 기기를 통해 획득되는 모바일 기반 개인건강기록(mobile personal health records, mPHR)을 포함하는 의료 서비스를 위한 통합 데이터셋의 생성 기술에 대하여 제안한다.We propose the creation of integrated data sets for healthcare services, including electronic health records (EHR) and mobile personal health records (mPHR) obtained via mobile devices.

구체적인 설명에 앞서 간단히 살펴보면, EHR은 높은 수준의 진료를 용이하게 하는 920개의 레코드로 이루어질 수 있다. 이러한 기록에는 의료기관에서 측정되는 심장 질환과 관련된 많은 특성이 포함될 수 있다. 그러나 EHR은 mPHR에 비해 상대적으로 정적이라 할 수 있으며, 이는 사용자(환자)가 병원에서 진료를 받을 때에만 정보가 업데이트되기 때문이라 할 수 있다. 그러므로 EHR 만 사용하여 사용자의 건강 상태 관련 분석 내지 진단을 수행하는 경우에는 환자의 현재 건강 상태를 정확하게 진단하기가 어려울 수 있다.Briefly, the EHR can be made up of 920 records that facilitate high-quality care. These records may include many of the characteristics associated with heart disease measured at a medical facility. However, the EHR is relatively static compared to the mPHR, which can be attributed to the fact that the information is updated only when the user (patient) receives medical treatment at the hospital. Therefore, it may be difficult to accurately diagnose the current health condition of a patient when analyzing or diagnosing a user's health condition using only EHR.

따라서, 본원에서는 이러한 제한을 극복하기 위해 EHR과 mPHR이 통합된 통합 데이터셋을 생성하는 기술에 대하여 제안한다. 여기서, mPHR은 모바일 기기가 주기적으로 측정하는 데이터의 집합(데이터셋)을 의미할 수 있으며, 사용자의 건강 상태 관련 분석 내지 진단 수행 시 mPHR을 이용하는 경우에는 사용자의 현재 상태에 대한 정확한 분석/진단이 이루어질 수 있다. 통합 데이터셋의 생성 방법에 대한 구체적인 설명은 다음과 같다.Therefore, in order to overcome these limitations, we propose a technique to generate an integrated data set with integrated EHR and mPHR. Here, the mPHR may mean a set (data set) of data periodically measured by the mobile device. When the mPHR is used in the analysis or diagnosis of the health status of the user, accurate analysis / diagnosis of the user's current state Lt; / RTI > A detailed description of how to create an integrated dataset follows.

도 1은 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋(data set) 생성 시스템(이하 '본 데이터셋 생성 시스템'이라 함)의 개략적인 구성을 나타낸 도면이다.FIG. 1 is a diagram showing a schematic configuration of a data set generation system (hereinafter referred to as 'present data set generation system') for providing a healthcare service according to an embodiment of the present invention.

도 1을 참조하면, 본 데이터셋 생성 시스템(100)은 복수의 의료기관의 장치(10), 복수의 모바일 기기(20) 및 헬스케어 서비스 제공을 위한 데이터셋 생성 장치(30, 이하 '본 데이터셋 생성 장치'라 함)를 포함할 수 있다.1, the data set generation system 100 includes a plurality of medical institution apparatuses 10, a plurality of mobile apparatuses 20, and a data set generation apparatus 30 for providing healthcare services Generating device ').

복수의 의료기관의 장치(10)는 복수의 사용자에 대한 전자건강기록(electronic health records, EHR)을 제공할 수 있다. 복수의 의료기관의 장치(10)는 복수의 사용자(환자)의 건강과 관련된 정보(데이터)를 EHR로서 전자적으로 기록하여 제공할 수 있다. 즉, EHR은 복수의 의료기관의 장치(10)에 의하여 제공될 수 있다. EHR로부터 획득 가능한 사용자의 건강 관련 정보(데이터)로는 나이, 성별, 콜레스테롤, 혈당, 혈압 등의 정보가 포함될 수 있으나, 이에만 한정되는 것은 아니다. 또한, 복수의 의료기관의 장치(10)에 대응하는 의료기관으로는 일예로 병원, 한방병원, 치과병원, 대학 병원, 조산원, 의원, 한의원, 치과의원, 요양병원, 종합병원 등이 포함될 수 있으며, 이에만 한정되는 것은 아니다.The devices 10 of a plurality of medical institutions may provide electronic health records (EHR) for a plurality of users. A plurality of medical institution apparatuses 10 can electronically record and provide information (data) related to the health of a plurality of users (patients) as EHRs. That is, the EHR can be provided by the apparatus 10 of a plurality of medical institutions. The health information (data) of the user obtainable from the EHR may include, but is not limited to, information such as age, sex, cholesterol, blood sugar, and blood pressure. The medical institutions corresponding to the plurality of medical institution apparatuses 10 may include hospitals, oriental hospitals, dental hospitals, university hospitals, midwives, clinics, oriental clinics, dental clinics, nursing hospitals, general hospitals, and the like. But is not limited thereto.

복수의 모바일 기기(20)는 복수의 사용자 각각에 대한 모바일 기반의 개인건강기록(mobile personal health records, mPHR)을 측정하여 제공할 수 있다. 즉, 모바일 기반의 개인건강기록인 mPHR은 사용자 각각이 소지한 모바일 기기를 통해 측정되어 제공될 수 있다. mPHR로부터 획득 가능한 사용자의 건강 관련 정보(데이터)로는 혈압, 심장 박동수, 혈당, 활동 칼로리 등의 정보가 포함될 수 있으나, 이에만 한정되는 것은 아니다.A plurality of mobile devices 20 may measure and provide mobile based personal health records (mPHR) for each of a plurality of users. That is, mPHR, a mobile-based personal health record, can be measured and provided through mobile devices owned by each user. The health-related information (data) of the user obtainable from the mPHR may include, but is not limited to, blood pressure, heart rate, blood sugar, and active calorie.

모바일 기기(20)는 휴대성과 이동성이 보장되는 이동 통신 장치로서, 예를 들면, PCS(Personal Communication System), GSM(Global System for Mobile communication), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(WCode Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트패드(SmartPad), 태블릿 PC, 노트북, 웨어러블 디바이스 등과 같은 모든 종류의 무선 통신 장치를 포함할 수 있으며, 이에 한정되는 것은 아니다.The mobile device 20 is a mobile communication device with guaranteed portability and mobility, for example, a PCS (Personal Communication System), a GSM (Global System for Mobile communication), a PDC (Personal Digital Cellular), a PHS (Personal Handyphone System) , PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication) -2000, CDMA (Code Division Multiple Access) -2000, W-CDMA (WCode Division Multiple Access), Wibro (Wireless Broadband Internet) , Smart pads, tablet PCs, notebooks, wearable devices, and the like, but is not limited thereto.

본 데이터셋 생성 장치(30)는 복수의 의료기관의 장치(10)로부터 획득되는 전자건강기록(EHR)을 포함하는 제1 데이터셋 및 복수의 모바일 기기(20)로부터 획득되는 모바일 기반의 개인건강기록(mPHR)을 포함하는 제2 데이터셋을 생성하여 제1 데이터셋 및 상기 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하고, 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터가 통합된 통합 데이터셋을 생성할 수 있다.The data set generating apparatus 30 includes a first data set including an electronic health record (EHR) acquired from a plurality of medical institution apparatuses 10 and a mobile-based personal health record obtained from a plurality of mobile apparatuses 20 (mPHR) to perform preprocessing on the data contained in each of the first data set and the second data set, and preprocesses the data of the preprocessed first data set and the preprocessed second data set The data of the first data set and the data of the preprocessed second data set, which are preprocessed through matching, can be generated.

여기서, 본 데이터셋 생성 장치(30)가 복수의 의료기관의 장치(10)로부터 획득하는 EHR 및 복수의 모바일 기기(20)로부터 획득하는 mPHR은 네트워크(40)를 통해 획득될 수 있다.Here, the EHR obtained by the data set generating apparatus 30 from the apparatus 10 of the plurality of medical institutions and the mPHR obtained from the plurality of mobile apparatuses 20 can be obtained through the network 40. [

네트워크(40)는 일예로 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, NFC(Near Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함될 수 있으며, 이에 한정된 것은 아니다.The network 40 may be, for example, a 3rd Generation Partnership Project (3GPP) network, a Long Term Evolution (LTE) network, a World Interoperability for Microwave Access (WIMAX) network, (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), a Bluetooth network, an NFC (Near Field Communication) network, a satellite broadcasting network, an analog broadcasting network, a DMB (Digital Multimedia Broadcasting) But is not limited thereto.

이러한 본 데이터셋 생성 장치(30)는, EHR의 경우 의료 전문가가 제공한 진료 및 처방 정보가 포함되어 있어 사용자에 대한 신뢰성 있는 의료 정보를 제공할 수 있는 데에 반해 mPHR에 비해 상대적으로 정적이기 때문에 EHR을 사용하는 것만으로는 사용자의 현재 건강 상태를 정확하게 파악하기 어려운 문제가 있고, mPHR의 경우 EHR에 비해 사용자의 건강 관련 정보가 상대적으로 짧은 기간에 측정되어 업데이트됨에 따라 사용자의 현재 상태를 정확하게 파악할 수 있는 데에 반해 획득된 정보가 의료 전문가에 의해 수집되거나 병원의 의료 기기를 사용하여 수집된 것이 아님에 따라 신뢰성이 다소 떨어진다는 점을 고려하여, EHR과 mPHR이 통합된 통합 데이터셋을 생성하는 기술에 대하여 제안한다. 본 데이터셋 생성 장치(30)에 대한 보다 구체적인 설명은 다음과 같다.Since the present data set generating apparatus 30 includes the medical care and prescription information provided by the medical professional in the case of the EHR, it can provide the reliable medical information to the user, while it is relatively static compared to the mPHR It is difficult to accurately grasp the user's current health state only by using the EHR. In the case of mPHR, since the health information of the user is measured and updated in a relatively short period of time as compared with the EHR, EHR and mPHR generate an integrated data set, taking into account the fact that the information obtained versus being available is not collected by a medical professional or collected using a hospital medical device, Technology. A more detailed description of the data set generating apparatus 30 is as follows.

이하 본 데이터셋 생성 장치(30)에 대해 설명함에 있어서, 본 데이터셋 생성 장치(30)는 사용자의 현재 상태를 반영하여 일예로 심장 질환이 존재하는지 여부에 대한 정확한 진단이 이루어질 수 있도록 하는 통합 데이터셋을 생성할 수 있다. 즉, 본원은 다양한 유형의 질환 중 일예로 심장 질환에 초점을 맞춘 분석이 이루어질 수 있도록 하는 통합 데이터셋의 생성 기술에 대해서 설명하며, 다만 이에만 한정되는 것은 아니고, 다양한 유형의 질환에 대한 분석이 용이하도록 하는 통합 데이터셋의 생성 또한 가능하다.Hereinafter, the data set generating apparatus 30 will be described in detail. The data set generating apparatus 30 includes an integrated data generating unit 30 for reflecting a current state of a user, for example, You can create a set. In other words, the present invention describes a technique for generating an integrated data set that enables an analysis focusing on heart diseases to be performed, for example, among various types of diseases, including, but not limited to, analysis of various types of diseases It is also possible to create an integrated dataset that facilitates it.

본 데이터셋 생성 장치(30)는 데이터셋 생성부(31), 전처리부(32) 및 매칭부(33)를 포함할 수 있다.The data set generating apparatus 30 may include a data set generating unit 31, a preprocessing unit 32, and a matching unit 33.

데이터셋 생성부(31)는 전자건강기록(electronic health records, EHR)을 포함하는 제1 데이터셋 및 모바일 기기를 통해 획득되는 개인건강기록(mobile personal health records, mPHR)을 포함하는 제2 데이터셋을 생성할 수 있다.The data set generation unit 31 generates a first data set including electronic health records (EHR) and a second data set including mobile personal health records (mPHR) Can be generated.

데이터셋 생성부(31)는 UCI(University of California-Irvine) 기계 학습 저장소(Machine Learning Repository)에서 제공하는 데이터로서 서로 다른 장소를 갖는 복수의 데이터셋에 속한 데이터를 통합하여 제1 데이터셋을 생성할 수 있다. 또한, 데이터셋 생성부(31)는 모바일 기기(20)를 통해 획득되는 데이터(즉, mPHR 데이터)의 속성이 제1 데이터셋의 속성과 매치되도록 제2 데이터셋을 생성할 수 있다.The data set generation unit 31 generates a first data set by integrating data belonging to a plurality of data sets having different locations as data provided by a University of California-Irvine (Machine Learning Repository) can do. The data set generating unit 31 may generate the second data set such that the attribute of the data (i.e., mPHR data) acquired through the mobile device 20 matches the attribute of the first data set.

구체적으로, 앞서 말한 바와 같이 일예로 심장 질환에 초점을 맞춘 분석이 이루어질 수 있도록, 데이터셋 생성부(31)는 UCI 기계 학습 저장소의 심장 질환 데이터셋 디렉토리의 데이터를 사용하여 EHR을 포함하는 제1 데이터셋을 생성할 수 있다. 심장 질환 데이터셋 디렉토리에는 심장 질환 진단과 관련된 4가지의 데이터셋이 존재할 수 있다. 이때, 4가지의 데이터셋 각각은, 4개의 장소(즉, Cleveland Clinic Foundation, OH, USA, Hungarian Institute of Cardiology, Budapest, Hungary, Veterans Affairs Medical Center, CA, USA, and University Hospital, Zurich, Switzerland) 중 어느 하나에서 수집함으로써 구성(생성)될 수 있다. 또한, 상기 4가지의 데이터셋 각각은 동일한 인스턴스 형식으로 된 14 개의 원시 속성(raw attributes)으로 구성될 수 있다. 데이터셋 생성부(31)는 일예로 서로 다른 장소에서 수집된 상기 4가지의 데이터셋을 통합함으로써 제1 데이터셋을 생성할 수 있다.Specifically, the data set generation unit 31 uses the data of the heart disease data set directory of the UCI machine learning repository so that an analysis focusing on heart disease can be performed as described above, for example, You can create datasets. The Heart Disease Data Set Directory can contain four sets of data related to the diagnosis of heart disease. At this time, each of the four datasets was stored at four locations (ie Cleveland Clinic Foundation, OH, USA, Hungarian Institute of Cardiology, Budapest, Hungary, Veterans Affairs Medical Center, CA, USA, and University Hospital, Zurich, Switzerland) (Generated) by collecting data from any one of them. In addition, each of the four data sets may be composed of fourteen raw attributes in the same instance format. The data set generating unit 31 may generate a first data set by integrating the four data sets collected at different places.

도 2는 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 장치에서 고려되는 EHR의 속성을 나타낸 도면이다.2 is a diagram illustrating attributes of an EHR considered in a data set generating apparatus for providing a healthcare service according to an embodiment of the present invention.

도 2를 참조하면, 본 데이터셋 생성 장치(30)에서 고려되는 EHR의 속성은 14가지가 포함될 수 있다. EHR의 14가지 속성(attributes)으로는 나이(age), 성별(sex), 흉통 유형(chest pain type, cp), 안정혈압(resting blood pressure, trestbps), 혈청 콜레스테롤(serum cholesterol, chol), 공복혈당(fasting blood sugar, fbs), 안정 시 심전도 결과(resting electrocardiographic results, restecg), 최대 심박수(maximum heart rate achieved, thalach), 운동 유발성 협심증(exercise induced angina, exang), 휴식과 관련된 활동에 의해 유도된 ST 우울증(ST depression induced by exercise relative to rest, oldpeak), 피크 운동 ST 세그먼트의 슬롭(the slop of the peak exercise ST segment, slope), 플루오로스코피에 의해 채색된 주요 혈관의 수(number of major vessels colored by flourosopy, ca), 지중해빈혈(thalassemia, thal), 심장 질환의 존재(existence of heart disease, num)가 포함될 수 있다.Referring to FIG. 2, 14 attributes of the EHR considered in the data set generating apparatus 30 may be included. The 14 attributes of EHR include age, sex, chest pain type, cp, resting blood pressure, trest bps, serum cholesterol, chol, In addition to fasting blood sugar (FBS), resting electrocardiographic results, restecg, maximum heart rate achieved, thalach, exercise induced angina, exang, The number of major blood vessels pigmented by fluoroscopy (number of slices), the number of major blood vessels stained by fluoroscopy major vessels colored by flourosopy, ca), Mediterranean anemia (thalassemia, thal), presence of heart disease (num).

EHR의 모든 속성에 대한 값(즉, 속성 값)은 숫자 값으로 표현될 수 있다. 일예로, 성별(sex)은 남자인 경우 1, 여자인 경우 0으로 표현될 수 있다. 흉통 유형(cp)은 전형적인 협심증(typical angina)인 경우 1, 비정형 협심증(atypical angina)인 경우 2, 비-협심 통증(non-anginal pain)인 경우 3, 무증상(asymptomatic)인 경우 4로 표현될 수 있다. 공복혈당(fbs)은 120 mg/dl를 초과하는 경우 1, 120 mg/dl를 초과하지 않는 경우 0으로 표현될 수 있다. 안정 시 심전도 결과(restecg)는 정상인 경우(normal) 0, ST-T파를 가지는 경우(having ST-T wave abnormality) 1, 좌심실비대(left ventricular hypertrophy)로 확진되거나 가능성이 있는 경우 2로 표현될 수 있다. 운동 유발성 협심증(exang)은 존재하는 경우 1, 존재하지 않는 경우 0으로 표현될 수 있다. 피크 운동 ST 세그먼트의 슬롭(slope)은 업슬로핑(upsloping)인 경우 1, 평평한(flat) 경우 2, 다운 슬로핑(downsloping)인 경우 3으로 표현될 수 있다. 지중해빈혈(thal)은 정상(normal)인 경우 3, 고정된 결함(fixed defect)인 경우 6, 심장병으로 진단(diagnosis of heart disease)된 경우 7로 표현될 수 있다. 심장 질환의 존재(num)는 존재하지 않는 경우 0, 존재하는 경우 1로 표현될 수 있다.A value (i.e., an attribute value) for all attributes of the EHR can be represented by a numerical value. For example, sex can be expressed as 1 for male and 0 for female. The type of chest pain (cp) is represented by 1 in typical angina, 2 in atypical angina, 3 in non-anginal pain, and 4 in asymptomatic . Fasting blood glucose (fbs) may be expressed as 1 for greater than 120 mg / dl, or as 0 for not exceeding 120 mg / dl. The resting electrocardiogram results (restecg) are expressed as normal (normal) 0, with ST-T wave abnormality (1), or with left ventricular hypertrophy (2) . Exercise-induced angina (exang) can be expressed as 1 if present and 0 if not. The slope of the peak motion ST segment can be expressed as 1 for upsloping, 2 for flat, and 3 for downsloping. Mediterranean anemia (thal) can be expressed as 3 in normal, 6 in fixed defect, and 7 in diagnosis of heart disease. The presence (num) of heart disease can be expressed as 0 if it does not exist, or 1 if it is present.

이에 따르면, 데이터셋 생성부(31)를 통해 생성된 제1 데이터셋은 도 2에 도시된 14가지의 속성에 대한 정보를 포함할 수 있다.According to this, the first data set generated through the data set generating unit 31 may include information on the fourteen attributes shown in FIG.

또한, 데이터셋 생성부(31)는 통합 데이터셋 생성시 모바일 기기(20)를 통해 획득되는 데이터의 속성(즉, mPHR의 속성)이 제1 데이터셋의 속성(즉, EHR의 속성)과 매치되도록 제2 데이터셋을 생성할 수 있다.In addition, the data set generating unit 31 generates an integrated data set in which the attribute of the data (i.e., the attribute of the mPHR) acquired through the mobile device 20 matches the attribute of the first data set (i.e., the attribute of the EHR) So that the second data set can be generated.

이때, 데이터셋 생성부(31)를 통해 생성된 제2 데이터셋은 일예로 심장 질환과 관련하여 4가지의 속성을 포함할 수 있다. 제2 데이터셋에 포함된 4가지의 속성, 달리 말해 mPHR과 관련하여 모바일 기기로부터 획득하는 데이터의 속성 정보로는 일예로, 혈압, 심장 박동수, 혈당 및 활동 칼로리가 포함될 수 있다. 이에 따르면, 제1 데이터셋, 제2 데이터셋 및 후술할 통합 데이터셋에 포함된 데이터는 일예로 심장 질환 관련 데이터일 수 있다. At this time, the second data set generated through the data set generating unit 31 may include four attributes related to heart disease, for example. The attribute information of the four attributes included in the second data set, that is, the data acquired from the mobile device in relation to the mPHR, may include, for example, blood pressure, heart rate, blood sugar and active calories. Accordingly, the data included in the first data set, the second data set, and the integrated data set to be described later may be, for example, heart disease-related data.

또한, 데이터셋 생성부(31)를 통해 생성된 제1 데이터셋 및 제2 데이터셋 각각은 일예로 920개의 레코드로 이루어질 수 있다. 이때 920개의 레코드라 함은 920명의 사용자들에 대한 건강 관련 기록을 의미할 수 있으며, 그 개수는 본원의 이해를 돕기 위한 하나의 예시일 뿐 이에만 한정되는 것은 아니다.Each of the first data set and the second data set generated through the data set generating unit 31 may be composed of 920 records, for example. Herein, 920 records may mean health related records for 920 users, and the number of records is only one example for helping understanding of the present invention.

전처리부(32)는 제1 데이터셋과 제2 데이터셋이 생성된 이후 통합 데이터셋의 생성을 위해, 전자건강기록(EHR)을 포함하는 제1 데이터셋 및 모바일 기기를 통해 획득되는 개인건강기록(mPHR)을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행할 수 있다. 이에 따르면, 제1 데이터셋 및 제2 데이터셋은 전처리가 수행되기 이전에 생성될 수 있다.The preprocessing unit 32 may be configured to generate a first data set including an electronic health record (EHR) and a personal data record obtained through a mobile device for the generation of an integrated data set after the first data set and the second data set are generated (mPHR) for the data contained in each of the second data sets. According to this, the first data set and the second data set can be generated before the preprocessing is performed.

구체적으로, 전처리부(32)는 제1 데이터셋과 제2 데이터셋 내의 각 개별 레코드에 대하여 전처리를 수행할 수 있다. 즉, 전처리부(32)는 제1 데이터셋과 제2 데이터셋 각각에 포함된 개별 레코드에 대하여 전처리를 수행할 수 있다. 다시 말해, 전처리부(32)는 제1 데이터셋에 포함된 EHR 관련 개별 레코드 각각에 대하여 전처리를 수행하고, 제2 데이터셋에 포함된 mPHR 관련 개별 레코드 각각에 대하여 전처리를 수행할 수 있다.Specifically, the preprocessing unit 32 may perform preprocessing on each individual record in the first data set and the second data set. That is, the preprocessing unit 32 may perform preprocessing on the individual records included in the first data set and the second data set. In other words, the preprocessing unit 32 preprocesses each EHR-related individual record included in the first data set and preprocesses each mPHR-related individual record included in the second data set.

전처리부(32)는 전처리 수행 시 제1 데이터셋 및 제2 데이터셋 각각에 포함된 데이터(즉, 레코드)에 대하여, 누락 값 또는 이상치가 존재하는 것으로 판단되는 경우, 누락 값 또는 이상치가 속한 데이터셋 내에서 누락 값 또는 이상치에 대응하는 속성과 동일 속성에 속하는 속성 값들의 평균값으로 대체하는 전처리를 수행할 수 있다. 이러한 전처리 과정은 클렌징(cleansing) 과정이라 달리 표현될 수 있다.When the pre-processing unit 32 determines that a missing value or an ideal value exists in the data (i.e., the record) included in each of the first data set and the second data set in the pre-processing, the pre- Processing can be performed by replacing the attribute corresponding to the missing value or the outliers with the average value of the attribute values belonging to the same attribute within the set. This preprocessing process can be expressed as a cleansing process.

데이터셋 내의 누락된 값이나 이상치는 분석의 정확성을 떨어뜨리므로, 본 데이터셋 생성 장치(30)는 전처리부(32)를 통해 제1 데이터셋 및 제2 데이터셋 내의 누락 값 또는 이상치를 속성 값들의 평균값으로 대체하는 전처리를 수행함으로써, 분석의 정확성을 향상시킬 수 있다. 즉, 전처리를 통해 제1 데이터셋과 제2 데이터셋에 포함된 누락 값 및 이상치를 제거(즉, 속성 값들의 평균값으로 대체)함으로써, 본 데이터셋 생성 장치(30)가 생성하는 통합 데이터셋에 기반한 분석 수행 시 보다 정확한 분석이 이루어지도록 제공할 수 있다.Since the missing values or anomalies in the dataset degrade the accuracy of the analysis, the dataset generation apparatus 30 can acquire missing values or anomalies in the first dataset and the second dataset through the preprocessing unit 32, , The accuracy of the analysis can be improved. That is, by eliminating the missing value and the ideal value included in the first data set and the second data set through the preprocessing (that is, replacing the missing value and the ideal value with the average value of the attribute values), the integrated data set generated by the present data set generating device 30 It is possible to provide a more accurate analysis when performing the analysis based on the analysis result.

예를 들어, 제1 데이터셋 내의 920개의 레코드 중 어느 한 레코드(즉, 어느 한 사용자에 대한 건강 관련 기록)에 있어서, 흉통 유형(cp) 속성에 대한 값(즉, 흉통 유형 속성 값)이 누락되어 있다고 가정하자. 또한, 상기 어느 한 레코드를 제외한 919개의 레코드에 있어서는 흉통 유형 속성에 대한 값이 모두 존재한다고 가정하자. 이러한 경우, 전처리부(32)는 누락된 상기 흉통 유형 속성의 속성 값을, 해당 누락 값이 속해 있는 제1 데이터셋 내에서 누락된 값에 대응하는 속성인 흉통 유형 속성과 동일 속성에 속하는 속성 값들의 평균값으로 대체하는 전처리를 수행할 수 있다. 즉, 전처리부(32)는 누락된 흉통 유형의 속성 값을, 제1 데이터셋 내의 919개의 레코드에서 흉통 유형 속성에 속하는 919개의 흉통 유형 속성 값들의 평균값으로 대체할 수 있다.For example, if a value for a chest pain type (cp) property (i.e., a chest pain type attribute value) is missing for any one of 920 records in the first data set (i.e., a health related record for any one user) . It is also assumed that all the values for the chest pain type property are present in 919 records except for one of the above records. In this case, the preprocessing unit 32 compares the attribute value of the missing chest pain type attribute with the attribute value of the chest pain type property that is the attribute corresponding to the missing value in the first data set to which the missing value belongs To the average value of the average values. That is, the preprocessing unit 32 may replace the attribute value of the missing chest pain type with the average value of the 919 chest pain type attribute values belonging to the chest pain type attribute in the 919 records in the first data set.

또한, 전처리부(32)는 매칭부(33)를 통한 매칭이 수행되기 이전에 통합 데이터셋의 포맷(통합 포맷)을 설정할 수 있다. 즉, 전처리부(32)는 통합 데이터셋을 생성함에 있어서 EHR과 mPHR을 통합시키기에 적합한 포맷을 설정할 수 있다. 이때, 통합 데이터셋의 포맷은 EHR을 포함하는 제1 데이터셋의 포맷과 mPHR을 포함하는 제2 데이터셋의 포맷을 고려하여 설정될 수 있으며, 또는 사용자에 의하여 설정될 수 있다.In addition, the preprocessing unit 32 may set the format (integrated format) of the integrated data set before the matching through the matching unit 33 is performed. That is, the preprocessing unit 32 can set a format suitable for integrating the EHR and the mPHR in generating the integrated data set. At this time, the format of the integrated data set may be set considering the format of the first data set including the EHR and the format of the second data set including the mPHR, or may be set by the user.

매칭부(33)는 제1 데이터셋과 제2 데이터셋 각각에 대하여 전처리가 수행된 이후, 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터가 통합된 통합 데이터셋을 생성할 수 있다. 즉, 통합 데이터셋은 전처리된 제1 데이터셋의 데이터(개별 레코드) 및 전처리된 제2 데이터의 데이터(개별 레코드) 간의 매칭에 의해 생성될 수 있다.After the pre-processing is performed for each of the first data set and the second data set, the matching unit 33 compares the data of the preprocessed first data set and the data of the preprocessed second data set, It is possible to generate an integrated data set in which the data of the set and the data of the preprocessed second data set are integrated. That is, the unified data set can be generated by matching between the data (individual records) of the preprocessed first data set and the data (individual records) of the preprocessed second data.

매칭부(33)는 제1 데이터셋과 제2 데이터셋 내의 각 개별 레코드에 대하여 매칭을 수행하되, 제1 데이터셋의 개별 레코드와 제2 데이터셋의 개별 레코드 간에 매칭을 수행할 수 있다. The matching unit 33 may perform matching for each individual record in the first data set and the second data set and may perform matching between the individual record in the first data set and the individual record in the second data set.

매칭부(33)는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성 간의 동일 유무를 고려한 매칭을 통해 통합 데이터셋을 생성할 수 있다.The matching unit 33 may generate an integrated data set through matching based on whether the data attribute of the first data set and the data attribute of the second data set are the same or not.

구체적으로, 매칭부(33)는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일하지 않는 것으로 판단되는 경우, 제2 데이터셋의 데이터 속성을 제1 데이터셋의 데이터 속성과 결합시킬 수 있다. 또한, 매칭부(33)는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일한 것으로 판단되는 경우, 제1 데이터셋의 데이터 속성을 제2 데이터셋의 데이터 속성으로 덮어씌울 수 있다(달리 표현하여, 업데이트할 수 있다). 보다 자세하게는, 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일한 것으로 판단되는 경우, 제1 데이터셋의 데이터 속성 값이 제2 데이터셋의 데이터 속성 값으로 덮어 씌워질 수 있다.In particular, if the data attribute of the first data set and the data attribute of the second data set are not identical, the matching unit 33 may combine the data attribute of the second data set with the data attribute of the first data set . The matching unit 33 may overwrite the data attribute of the first data set with the data attribute of the second data set if it is determined that the data attribute of the first data set is identical to the data attribute of the second data set (Otherwise, it can be updated). More specifically, if it is determined that the data attribute of the first data set is the same as the data attribute of the second data set, the data attribute value of the first data set may be overwritten with the data attribute value of the second data set.

예를 들어, 제2 데이터셋의 데이터 속성으로는 혈압, 심장 박동수, 혈당 및 활동 칼로리가 존재하는데, 여기서 활동 칼로리를 제외한 혈압, 심장 박동수 및 혈당에 대한 속성은 제1 데이터셋의 데이터 속성(즉, 제1 데이터셋 내의 데이터 속성인 혈압(trestbps), 심장 박동수(thalach), 및 혈당(fbs)과 중복(동일)된다. 이에 따라, 매칭부(33)는 두 데이터셋 간의 속성 매칭 수행 시 혈압, 심장 박동수 및 혈당에 대하여 매칭이 이루어진 경우, 제1 데이터셋의 데이터 속성을 제2 데이터셋의 데이터 속성으로 덮어씌울 수 있다. 한편, 두 데이터셋 간의 속성 매칭 수행 시 활동 칼로리에 대하여 매칭이 이루어진 경우, 매칭부(33)는 제2 데이터셋의 데이터 속성을 제1 데이터셋의 데이터 속성과 결합시킬 수 있다.For example, the data attributes of the second data set include blood pressure, heart rate, blood glucose and activity calories, wherein the attributes for blood pressure, heart rate, and blood sugar, excluding activity calories, (Trbs), the heart rate (thalach), and the blood sugar (fbs), which are data attributes in the first data set. Accordingly, when the attribute matching between two data sets is performed, The data attributes of the first data set may be overwritten with the data attributes of the second data set if the matching is performed with respect to the heart rate and blood sugar. The matching unit 33 may combine the data attribute of the second data set with the data attribute of the first data set.

이때, 매칭부(33)는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성 간의 매칭 수행 시 전처리부(32)에서 설정된 통합 데이터셋의 포맷(통합 포맷)을 고려하여 속성을 결합시키거나 속성을 덮어씌울 수 있다. 달리 말해, 매칭부(33)는 매칭을 통해 EHR 및 mPHR 내의 개별 속성(특성)을 통합 데이터셋의 통일된 형식(즉, 통합 데이터셋의 통합 포맷)으로 덮어 씌우거나 결합시킬 수 있다. 구체적인 일예로, 매칭부(33)는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일한 것으로 판단되는 경우, 제1 데이터의 속성 값을 통합 데이터셋의 포맷을 고려하여 제2 데이터의 속성 값으로 대체하여 적용할 수 있다.At this time, the matching unit 33 combines the attributes in consideration of the format (integrated format) of the integrated data set set by the preprocessing unit 32 when performing the matching between the data attribute of the first data set and the data attribute of the second data set Or overwrite the property. In other words, the matching unit 33 can overwrite or combine the individual attributes (characteristics) in the EHR and mPHR with the unified format of the integrated data set (i.e., the integrated format of the integrated data set) through matching. In a specific example, when it is determined that the data attribute of the first data set is identical to the data attribute of the second data set, the matching unit 33 may compare the attribute value of the first data with the attribute value of the second data set Can be replaced with the attribute values of

본 데이터셋 생성 장치(30)는 매칭 과정을 수행함에 있어서, 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일한 경우 제1 데이터셋의 데이터 속성을 제2 데이터셋의 데이터 속성으로 덮어씌우고, 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일하지 않은 경우 제2 데이터셋의 데이터 속성을 제1 데이터셋의 데이터 속성과 결합시킴으로써 통합 데이터셋을 생성할 수 있다. When the data attribute of the first data set is equal to the data attribute of the second data set, the data set generator 30 performs a matching process, and sets the data attribute of the first data set to the data attribute of the second data set And may combine the data attributes of the second data set with the data attributes of the first data set if the data attributes of the first data set and the data attributes of the second data set are not the same.

이러한 과정을 통해, 본 데이터셋 생성 장치(30)는 EHR이 갖는 이점(즉, 신뢰성 있는 의료 정보의 제공)과 mPHR이 갖는 이점(즉, 사용자의 현재 상태를 정확하게 파악)을 함께 유지하여 EHR과 mPHR로부터 상호 보완적인 정보의 접근이 이루어질 수 있도록 하는 통합 데이터셋(즉, EHR과 mPHR이 통합된 통합 데이터셋)을 생성할 수 있다. 즉, 본원은 생성된 통합 데이터셋으로 하여금 사용자의 현재 상태를 반영하면서 보다 정확하고 신뢰성 있는 건강 관련 진단 내지 분석이 이루어질 수 있도록 제공할 수 있다.Through this process, the present data set generating apparatus 30 keeps together the advantages of the EHR (i.e., providing reliable medical information) and the advantages of the mPHR (i.e., accurately grasping the current state of the user) An integrated dataset (ie, an integrated dataset with integrated EHR and mPHR) can be generated that allows complementary information access from the mPHR. That is, the present invention can provide the integrated data set so that more accurate and reliable health related diagnosis or analysis can be performed while reflecting the current state of the user.

매칭부(33)에 의하여 생성된 통합 데이터셋은 일예로 심장 질환과 관련하여 15가지의 속성과 920개의 레코드를 포함하도록 이루어질 수 있다. 여기서, 15가지의 속성은 EHR의 속성인 14가지의 항목과 mPHR의 속성인 활동 칼로리의 항목이 함께 고려된 것을 의미할 수 있다.The integrated data set generated by the matching unit 33 may be configured to include, for example, 15 attributes and 920 records related to heart disease. Here, the fifteen attributes can mean that the 14 items of the EHR and the items of the active calorie, which are the attributes of the mPHR, are considered together.

또한, 통합 데이터셋은 특정 사용자(환자)의 현재 건강 상태에 대한 정확한 분석(판단)이 이루어질 수 있도록 제2 데이터셋으로부터 미리 설정된 주기로 업데이트될 수 있다. 달리 말해, 통합 데이터셋 내의 데이터는 특정 사용자의 현재 건강 상태에 대한 실시간 분석 및 정확한 분석이 이루어질 수 있도록, 제2 데이터셋에 대응하는 모바일 기반의 개인건강기록인 mPHR로부터 미리 설정된 주기로 업데이트될 수 있다. 여기서 미리 설정된 주기는 사용자에 의하여 설정될 수 있으며, 일예로 시간(time), 일(day) 등의 단위로 설정될 수 있다.In addition, the integrated data set can be updated at a predetermined cycle from the second data set so that an accurate analysis (determination) of the current health status of a particular user (patient) can be made. In other words, the data in the integrated data set can be updated at a predetermined cycle from the mPHR, a mobile-based personal health record corresponding to the second data set, so that real-time analysis and accurate analysis of the current health status of a particular user can be made . The predetermined period may be set by a user, and may be set in units of time, day, and the like.

또한, 통합 데이터셋 내의 레코드 수가 많을수록, 통합 데이터셋에 기초한 사용자의 건강 상태 진단 시 그 정확성이 향상될 수 있다.In addition, the greater the number of records in the consolidated data set, the better the accuracy in diagnosing the user's health condition based on the integrated data set.

이러한 본 데이터셋 생성 장치(30)는 EHR과 mPHR이 통합(병합)된 통합 데이터셋을 생성하여 제공할 수 있으며, 통합 데이터셋으로 하여금 고정밀 질병 진단이 가능하도록 할 수 있다. 여기서, EHR은 사용자(환자)의 과거 의료 기록을 포함하여 환자의 전반적인 건강 상태와 관련된 데이터를 포함할 수 있으며, 이에 따라 사용자(환자)의 과거 병력, 생체 신호, 약물, 방사선 기록 등을 확인할 수 있다. mPHR은 사용자의 개인 단말(모바일 기기)에 의해 기록된 데이터로서, 시간 경과에 따른 사용자의 실시간 건강 상태 관련 정보를 제공할 수 있으며, 이에 따라 사용자(환자)가 의료기관 외부에서 자신의 건강 상태의 추적이 가능하도록 하여 사용자의 현재 건강 상태에 대한 진단이 효과적으로 이루어지도록 할 수 있다. The present data set generating apparatus 30 can generate and provide an integrated data set in which the EHR and the mPHR are integrated (merged), thereby enabling the integrated data set to diagnose a high-precision disease. Here, the EHR may include data related to the overall health status of the patient, including the past medical history of the user (patient), thereby identifying past medical history, vital signs, medications, radiation records, etc. of the user have. The mPHR is data recorded by a user's personal terminal (mobile device), and can provide information related to the user's real-time health status over time. Accordingly, the user (patient) So that diagnosis of the current health state of the user can be effectively performed.

이를 고려하여, 본 데이터셋 생성 장치(30)는 통합 데이터셋으로 하여금 EHR과 mPHR을 통해 정확한 건강 진단 서비스를 가능하게 하여 사용자(환자)의 질병 진단에 보완적인 역할이 이루어지도록 할 수 있다. 다시 말해, 본원은 통합 데이터셋에 의하여 사용자에 대한 정확한 건강 진단 정보를 제공하며, 사용자의 질병 여부에 대한 진단/예측/분석 시 EHR과 mPHR로부터 상호 보완적인 정보의 접근이 가능하도록 할 수 있다. 일예로, 본원의 통합 데이터셋에 의하면, 심장 질환의 존재 여부에 대한 정확한 진단/예측/분석이 이루어질 수 있도록 할 수 있다.In consideration of this, the data set generating device 30 can enable the integrated data set to perform an accurate health diagnosis service through the EHR and the mPHR, so that it can play a complementary role in diagnosing the disease of the user (patient). In other words, we provide accurate health checkup information for the user by the integrated data set, and make it possible to access complementary information from EHR and mPHR when diagnosing / predicting / analyzing the user's disease. For example, the integrated data set of the present disclosure allows accurate diagnosis / prediction / analysis of the presence or absence of heart disease.

이하에서는 앞서 설명된 내용에 기초하여, 본 데이터셋 생성 장치(30)에 의하여 생성된 통합 데이터셋을 이용하여 사용자의 질환 가능성을 예측하는 기술에 대하여 설명하기로 한다.Hereinafter, a description will be made of a technique for predicting a disease possibility of a user by using the integrated data set generated by the present data set generating apparatus 30 based on the above-described contents.

도 3은 본원의 일 실시예에 따른 질환 예측 시스템(200)의 개략적인 구성을 나타낸 도면이다. 참고로, 본원에서 질환 예측이라 함은 질환이 존재할 가능성을 예측하거나 질환이 존재 여부에 대한 검출 또는 측정 등의 넓은 의미로 이해될 수 있다.FIG. 3 is a diagram illustrating a schematic configuration of a disease prediction system 200 according to an embodiment of the present invention. For reference, disease prediction in the present application may be understood in a broad sense such as predicting the possibility of a disease or detecting or measuring the presence of a disease.

도 3을 참조하면, 본원의 일 실시예에 따른 질환 예측 시스템(200)은 제1 단말 기기(50), 질환 예측 장치(70)를 포함할 수 있다.Referring to FIG. 3, the disease prediction system 200 according to an embodiment of the present invention may include a first terminal device 50 and a disease prediction device 70.

제1 단말 기기(50)는 예측 대상자의 건강 데이터를 제공할 수 있다. 즉, 제1 단말 기기(50)는 질환 가능성을 예측하고자 하는 대상인 예측 대상자의 건강 데이터를 측정하여 질환 예측 장치(70)로 제공할 수 있다. 여기서, 건강 데이터는 예측 대상자의 건강 상태와 관련된 데이터로서, 이는 제1 단말 기기(50)를 통해 측정되는 데이터로서 mPHR을 의미할 수 있다.The first terminal device 50 can provide the health data of the predicted person. That is, the first terminal device 50 may measure the health data of the predicted subject, which is a target for predicting the disease potential, and provide the measured health data to the disease predictor 70. Here, the health data is data related to the health state of the predicted person, which may mean mPHR as data measured through the first terminal device 50. [

또한, 제1 단말 기기(50)는 앞서 도 1에서 설명한 복수의 모바일 기기(20) 중 어느 하나의 모바일 기기를 의미할 수 있다. 따라서, 이하 생략된 내용이라 하더라도 모바일 기기(20)에 대하여 설명된 내용은 제1 단말 기기(50)에 대한 설명에도 동일하게 적용될 수 있다. Also, the first terminal device 50 may refer to any one of the plurality of mobile devices 20 described with reference to FIG. Therefore, even if omitted in the following description, the description of the mobile device 20 can be applied to the description of the first terminal device 50 as well.

또한, 제1 단말 기기(50)와 질환 예측 장치(70) 간에 데이터 송수신은 네트워크(60)를 통해 이루어질 수 있다. 여기서, 네트워크(60)는 앞서 설명한 네트워크(40)와 동일한 네트워크를 의미할 수 있다. 따라서, 이하 생략된 내용이라 하더라도 네트워크(40)에 대하여 설명된 내용은 네트워크(60)에 대한 설명에도 동일하게 적용될 수 있다.Data transmission / reception between the first terminal device 50 and the disease predicting device 70 may be performed via the network 60. [ Here, the network 60 may mean the same network as the network 40 described above. Therefore, the contents described for the network 40 can be equally applied to the description of the network 60 even if omitted below.

질환 예측 장치(70)는 수신부(71), 생성부(72) 및 예측부(73)를 포함할 수 있다.The disease predicting apparatus 70 may include a receiving unit 71, a generating unit 72, and a predicting unit 73. [

수신부(71)는 제1 단말 기기(50)로부터 예측 대상자의 건강 데이터를 수신할 수 있다.The receiving unit 71 can receive the health data of the person to be predicted from the first terminal device 50. [

생성부(72)는 복수의 사용자에 대한 전자건강기록(electronic health records, EHR) 및 복수의 사용자에 대한 모바일 기기를 통해 획득된 모바일 기반의 개인건강기록(mobile personal health records, mPHR)에 기초하여 생성된 통합 데이터셋을 이용하여 질환 예측 모델을 생성할 수 있다.The generating unit 72 may be configured to generate an electronic health record based on electronic health records (EHR) for a plurality of users and mobile personal health records (mPHR) obtained via a mobile device for a plurality of users A disease prediction model can be generated using the generated integrated data set.

여기서, 통합 데이터셋은 전자건강기록을 포함하는 제1 데이터셋 및 모바일 기반의 개인건강기록을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하고, 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 생성될 수 있다. 이러한 통합 데이터셋은 앞서 설명한 데이터셋 생성 장치(30)에 의하여 생성될 수 있다. 통합 데이터셋의 생성 과정에 대해서는 앞서 자세히 설명했으므로 이하 구체적인 설명을 생략하기로 한다.Here, the integrated data set performs pre-processing on the data contained in each of the first data set including the electronic health record and the second data set including the mobile-based personal health record, and the data of the preprocessed first data set And the data of the preprocessed second data set. Such an integrated data set may be generated by the data set generating apparatus 30 described above. Since the process of generating the integrated data set has been described in detail above, the detailed description will be omitted.

일예로, 생성부(72)는 수신부(71)를 통해 건강 데이터가 수신된 이후에 예측 대상자의 특성(상태, 유형) 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 질환 예측 모델을 생성할 수 있다. 이후 예측부(73)는 생성된 질환 예측 모델에 기초하여 수신부(71)에서 수신한 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측할 수 있다.For example, after the health data is received through the receiving unit 71, the generating unit 72 may generate the disease prediction model considering at least one of the characteristics (state and type) of the subject to be predicted and the type of disease to be predicted have. Then, the predicting unit 73 can predict a disease possibility of the predicted subject corresponding to the health data received by the receiving unit 71 based on the generated disease prediction model.

여기서, 예측 대상자의 특성(상태, 유형)이라 함은 건강 데이터를 기반으로 한 예측 대상자의 현재의 건강 상태를 의미할 수 있다. 예를 들면, 질환 예측 모델 생성시 고려되는 예측 대상자의 특성으로는 예측 대상자의 mPHR과 관련하여 현재 사용자의 혈압, 심장 박동수, 혈당 및 활동 칼로리가 어느 정도의 수준을 나타내는지에 대한 상태 정보를 의미할 수 있다. 또한, 예측하고자 하는 질환 유형이라 함은 예측 대상자에 대하여 어떤 유형의 질환 가능성을 예측하고자 하는지에 대한 정보를 의미할 수 있다. 일예로 예측하고자 하는 질환 유형으로는 심장 질환, 피부 질환, 뇌혈관 질환 등이 포함될 수 있으나, 이에만 한정되는 것은 아니다. 이러한 예측하고자 하는 질환 유형은 사용자에 의하여 설정될 수 있다.Here, the characteristics (state, type) of the predicted subject may refer to the current health state of the predicted subject based on the health data. For example, the characteristics of a predictor to be considered when generating a disease prediction model include state information about the current user's blood pressure, heart rate, blood sugar and activity calorie level relative to the mPHR of the predicted subject . Also, the type of disease to be predicted may mean information about what kind of disease possibility is to be predicted for the predicted subject. Examples of the disease type to be predicted include, for example, heart disease, skin disease, cerebrovascular disease, and the like. The type of disease to be predicted can be set by the user.

다른 일예로, 생성부(72)는 통합 데이터셋에 포함된 데이터를 기반으로 복수 사용자의 특성이 고려된 복수의 질환 예측 모델을 생성하고, 수신부(71)를 통해 건강 데이터가 수신된 이후에 예측 대상자의 특성(상태) 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 복수의 질환 예측 모델 중 예측 대상자에 최적화된 최적 질환 예측 모델을 선택할 수 있다. 이후 예측부(73)는 생성부(72)에서 선택된 최적 질환 예측 모델에 기초하여 수신부(71)에서 수신한 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측할 수 있다.In another example, the generating unit 72 generates a plurality of disease prediction models in which characteristics of a plurality of users are taken into account based on the data included in the integrated data set, and after the health data is received through the receiving unit 71, It is possible to select an optimal disease prediction model optimized for a prediction subject among a plurality of disease prediction models considering at least one of the characteristics (state) of the subject and the type of disease to be predicted. Thereafter, the predicting unit 73 can predict the possibility of disease to the predicted person corresponding to the health data received by the receiving unit 71 based on the optimal disease prediction model selected by the generating unit 72. [

즉, 일예로 예측 대상자에 대한 질환 가능성의 예측을 위해 이용되는 질환 예측 모델은, 건강 데이터가 수신된 이후에, 수신된 건강 데이터에 기초한 예측 대상자의 특성 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려함으로써 생성될 수 있다. 다른 일예로, 질환 예측 장치(70)는 건강 데이터가 수신되기 이전에 통합 데이터셋에 기초하여 복수의 질환 예측 모델을 생성할 수 있다. 이후, 예측 대상자에 대한 질환 가능성의 예측을 위해 이용되는 질환 예측 모델은, 건강 데이터가 수신되면 예측 대상자의 특성(상태) 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 복수의 질환 예측 모델 중 최적 질환 예측 모델을 선택함으로써 선정될 수 있다. That is, for example, the disease prediction model used for predicting the disease probability for the predicted subject considers at least one of the characteristics of the predicted subject based on the received health data and the type of disease to be predicted after the health data is received &Lt; / RTI > In another example, the disease predicting device 70 may generate a plurality of disease prediction models based on the integrated data set before the health data is received. Thereafter, the disease prediction model used for predicting the disease probability for the predicted subject is determined by taking into account at least one of the characteristics (state) of the predicted subject and the type of disease to be predicted, Can be selected by selecting a disease prediction model.

또한, 생성부(72)를 통해 생성되는 질환 예측 모델은, 통합 데이터셋에 포함된 데이터의 복수의 속성과 관련된 복수의 규칙을 포함할 수 있다. 즉, 질환 예측 모델은 일예로 통합 데이터셋에 포함된 15가지의 속성과 관련된 복수의 규칙을 포함할 수 있다. 여기서, 통합 데이터셋에 포함된 15가지의 속성에 대한 설명은 앞서 자세히 설명했으므로, 이하 생략하기로 한다.In addition, the disease prediction model generated through the generation unit 72 may include a plurality of rules related to a plurality of attributes of data included in the integrated data set. That is, the disease prediction model may include a plurality of rules associated with the fifteen attributes included in the integrated data set, for example. Here, the description of the fifteen attributes included in the integrated data set has been described in detail above, and will not be described below.

또한, 질환 예측 모델이 생성됨에 있어서, 생성되는 질환 예측 모델의 복수의 규칙의 조합 수 및 조합 순서는 예측 대상자의 특성 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 결정될 수 있다. 또한, 생성되는 질환 예측 모델의 복수의 규칙의 조합 수 및 조합 순서는 통합 데이터셋에 포함된 복수의 속성에 대하여 의료 관련 업계에서 정의(규정)된 속성 값 기준을 고려하여 결정될 수 있다. 또한, 생성되는 질환 예측 모델의 복수의 규칙의 조합 수 및 조합 순서는 자동으로 결정될 수 있다.Further, in generating the disease prediction model, the number of combinations of the plurality of rules of the disease prediction model to be generated and the combination order can be determined in consideration of at least one of the characteristics of the subject to be predicted and the type of disease to be predicted. In addition, the number and combination order of the plurality of rules of the disease prediction model to be generated can be determined in consideration of a plurality of attributes included in the integrated data set in consideration of attribute value criteria defined in the medical industry. In addition, the number of combinations of the plurality of rules of the disease prediction model to be generated and the combination order can be automatically determined.

또한, 질환 예측 모델은, 통합 데이터셋에 포함된 데이터에 대하여 의사결정트리(decision tree)를 적용함으로써 생성될 수 있다. 생성부(72)는 질환 예측 모델 생성시 일예로, IF-THEN 규칙을 제공하는 의사결정트리를 이용할 수 있으며, 이에만 한정되는 것은 아니다.The disease prediction model can also be generated by applying a decision tree to the data contained in the integrated data set. The generation unit 72 may use a decision tree that provides the IF-THEN rule as an example of generating the disease prediction model, but is not limited thereto.

예측부(73)는 질환 예측 모델에 기초하여 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측(또는 질환 존재 여부를 측정, 검출)할 수 있다. 구체적인 예로, 예측부(73)는 건강 데이터의 수신 시 생성된 질환 예측 모델에 기초하여 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측할 수 있다. 또는 예측부(73)는 복수의 질환 예측 모델 중 선택된 최적 질환 예측 모델에 기초하여 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측할 수 있다.The predicting unit 73 can predict (or measure or detect the presence or absence of a disease) the disease probability for the predicted subject corresponding to the health data based on the disease prediction model. As a specific example, the predicting unit 73 can predict the possibility of disease to the predicted person corresponding to the health data, based on the disease prediction model generated upon reception of the health data. Or the predicting unit 73 can predict a disease possibility for a predictive subject corresponding to health data based on an optimal disease prediction model selected from a plurality of disease predictive models.

이러한 본원의 일 실시예에 따른 질환 예측 장치(70)는 EHR과 mPHR이 통합된 통합 데이터셋을 이용하여 질환 예측 모델을 생성하고, 생성된 질환 예측 모델에 기초하여 사용자의 건강 상태 관련 질환 가능성을 예측하므로, 종래에 EHR 만을 이용하거나 mPHR만을 이용하는 것 대비 예측 정확도가 향상될 수 있다.The disease prediction apparatus 70 according to an embodiment of the present invention generates a disease prediction model using an integrated data set in which the EHR and the mPHR are integrated and calculates a disease state related disease probability related to the user based on the generated disease prediction model Prediction accuracy can be improved as compared to the case where only EHR is used or only mPHR is used.

이하에서는 종래에 EHR 만을 이용하거나 mPHR만을 이용하여 예측하는 것과 대비하여 본원에서 제안하는 통합 데이터셋 기반의 질환 예측의 우수성을 입증하기 위한 실험 결과에 대하여 설명하기로 한다.Hereinafter, experimental results for demonstrating the superiority of the disease-based disease prediction based on the integrated data set proposed in the present invention in comparison with the conventional method using only the EHR or using only the mPHR will be described.

본원의 우수성 입증을 위한 본원의 일 실험예에서는 R 버전 3.4.1을 사용하여 실험을 수행했다. Experiments were conducted using R version 3.4.1 in one experimental example of the present invention for demonstrating the superiority of the present invention.

또한, 본원의 일 실험예에서는 통합 데이터셋의 성능 비교를 위해 데이터 분류를 위한 IF-THEN 규칙을 제공하는 의사결정트리를 사용하여 심장 질환과 관련된 질환 예측 모델을 생성한다. 또한, 본원의 일 실험예에서는 통합 데이터셋 내의 'num' 속성을 이용하여 사용자(예를 들어, 예측 대상자 또는 통합 데이터셋 내에 기록된 사용자)의 심장 질환의 존재 여부(달리 말해, 심장 질환의 가능성 여부)를 '예(존재함, 존재할 가능성이 있음)' 또는 '아니오(존재하지 않음, 존재할 가능성이 없음)'로 분류할 수 있다.Also, in one experimental example of the present invention, a disease prediction model related to heart disease is generated using a decision tree that provides an IF-THEN rule for data classification for performance comparison of integrated data sets. In addition, in the example of the present invention, the 'num' attribute in the integrated data set is used to determine whether a user (for example, a user recorded in the predictor or integrated data set) has a heart disease (in other words, Whether there is a possibility of existence) or no (there is no possibility, there is no possibility of existence).

이하에서는 EHR과 mPHR을 포함하는 본원의 통합 데이터셋에 기초하여 생성된 질환 예측 모델과 종래의 EHR 또는 mPHR 기반의 질환 예측 모델에 대한 성능(정확성) 비교의 실험 예에 대하여 보다 자세히 설명하기로 한다.Hereinafter, an experimental example of a comparison of performance (accuracy) between a disease prediction model generated based on the integrated data set of the present invention including EHR and mPHR and a conventional EHR or mPHR based disease prediction model will be described in detail .

도 4는 본원의 일 실시예에 따른 통합 데이터셋에 의사결정트리를 적용함에 따라 생성된 질환 예측 모델의 예를 나타낸 도면이다.FIG. 4 is a diagram illustrating an example of a disease prediction model generated by applying a decision tree to an integrated data set according to an embodiment of the present invention. Referring to FIG.

도 4를 참조하면, 본원의 일 실험예에 따라 생성된 질환 예측 모델은 일예로 15 개의 규칙을 가질 수 있다. 이 규칙에 의하면 예측 대상자(또는 통합 데이터셋 내에 기록된 사용자)가 질환 가능성이 있는지 여부(달리 말해, 질환의 존재 여부)에 대해 '예' 또는 '아니오'로 분류될 수 있다. 이하 본원의 일 실험예에서는 질환 가능성 예측과 관련하여 일예로 심장 질환 가능성의 예측이 이루어질 수 있으며, 이에만 한정되는 것은 아니고, 다양한 질환에 대한 존재 가능성의 예측이 이루어질 수 있다.Referring to FIG. 4, the disease prediction model generated according to one experimental example of the present invention may have 15 rules as an example. According to this rule, the predicted person (or the user recorded in the integrated data set) can be classified as 'yes' or 'no' for the possibility of disease (in other words, whether or not the disease exists). Hereinafter, in one experimental example of the present invention, prediction of the possibility of a heart disease can be made, for example, in relation to predictability of a disease, and the possibility of existence of various diseases can be predicted.

이에 따르면, 통합 데이터셋 내에 기록된 모든 사용자들 또는 수신부(71)를 통해 수신한 예측 대상자(일예로, 신규 예측 대상자)에 대한 심장 질환 가능성의 여부는 의사결정트리 기반의 질환 예측 모델을 통해 예측될 수 있다.According to this, whether or not the possibility of heart disease for all users recorded in the integrated data set or the predicted object (for example, new predicted object) received through the receiving unit 71 is estimated through a decision tree- .

예를 들어, 본원의 일 실험예에 따라 생성된 질환 예측 모델에 의하면, 첫번? 규칙과 관련하여 특정 사용자(이는 신규 예측 대상자 또는 통합 데이터셋 내에 기록된 사용자 중 어느 하나의 사용자를 의미할 수 있음)의 건강 데이터의 흉통 유형을 나타내는 'cp' 값이 1, 2 또는 3이 아닌 경우, 두 번째 규칙과 관련하여 상기 특정 환자의 안정 혈압을 나타내는 'trestbps'에 대한 확인이 이루어질 수 있다. 일예로 두 번째 규칙에서 'trestbps'의 값이 132보다 작은 경우, 상기 특정 환자의 건강 데이터에서 세 번째 규칙과 관련된 속성에 대한 확인이 이루어질 수 있다. 즉, 세 번째 규칙과 관련된 속성으로서 일예로 콜레스테롤 'chol'에 대한 확인이 이루어질 수 있다. 여기서, 세 번째 규칙에서 특정 사용자의 'chol' 속성 값이 430보다 크다고 판단된 경우, 특정 사용자의 심장 질환 가능성 여부(즉, 심장 질환의 존재 여부)는 '예'로 예측될 수 있다. 참고로, 도 4의 도면 상에서 'hr' 속성은 mPHR과 관련된 심박수(심장 박동수)를 의미할 수 있다.For example, according to the disease prediction model generated according to one experimental example of the present invention, If the value of 'cp' representing the type of chest pain of the health data of a particular user (which may mean either a new predictor or a user recorded in the integrated data set) in relation to the rule is not 1, 2 or 3 , The 'trest bps' indicating the stable blood pressure of the specific patient can be confirmed with respect to the second rule. For example, if the value of 'trestbps' in the second rule is less than 132, confirmation of the attributes related to the third rule in the health data of the specific patient can be made. That is, as an attribute related to the third rule, for example, confirmation of cholesterol 'chol' can be made. Here, if it is determined in the third rule that the value of the 'chol' attribute of a specific user is greater than 430, the possibility of a specific user's heart disease (i.e., presence of a heart disease) can be predicted as YES. For reference, the 'hr' attribute in FIG. 4 may refer to the heart rate (heart rate) associated with the mPHR.

도 5는 종래의 EHR에 의사결정트리를 적용함에 따라 생성된 질환 예측 모델의 예를 나타낸 도면이다.5 is a diagram showing an example of a disease prediction model generated by applying a decision tree to a conventional EHR.

도 5를 참조하면, EHR 기반의 질환 예측 모델에서는 일예로 'cp'가 첫 번째 규칙으로 고려되어 질환 가능성의 예측이 이루어질 수 있다. 그런데, EHR 기반의 질환 예측 모델의 경우에는 통합 데이터셋 기반의 질환 예측 모델에 비해 규칙의 수가 적기 때문에 본원의 기술 대비 질환 예측 모델의 정확성이 낮다는 문제가 있다.Referring to FIG. 5, in the EHR-based disease predicting model, 'cp' is considered as the first rule, for example, to predict the disease potential. However, the EHR-based disease prediction model has a problem that the accuracy of the disease prediction model is lower than that of the present invention because the number of rules is smaller than that of the integrated disease-based disease prediction model.

도 6은 종래의 mPHR에 의사결정트리를 적용함에 따라 생성된 질환 예측 모델의 예를 나타낸 도면이다.6 is a diagram illustrating an example of a disease prediction model generated by applying a decision tree to a conventional mPHR.

도 6을 참조하면, mPHR 기반의 질환 예측 모델에서는 일예로 5가지의 규칙이 존재할 수 있다. 이는 mPHR의 경우 심장 질환과 관련하여 4가지의 속성(즉, 혈압, 심장 박동수, 혈당 및 활동 칼로리)만 포함되어 있기 때문이라 할 수 있다. 이에 따르면, mPHR 기반의 질환 예측 모델 또한 EHR 기반의 질환 예측 모델과 마찬가지로 통합 데이터셋 기반의 질환 예측 모델에 비해 규칙의 수가 적기 때문에, 본원의 기술 대비 질환 예측 모델의 정확성이 낮다는 문제가 있다.Referring to FIG. 6, there are five rules for the mPHR-based disease prediction model. This can be attributed to the fact that mPHR contains only four attributes related to heart disease (ie, blood pressure, heart rate, blood sugar and activity calories). According to this, the mPHR-based disease prediction model, like the EHR-based disease prediction model, has a problem that the accuracy of the disease prediction model is lower than that of the present invention because the number of rules is smaller than that of the integrated data set based disease prediction model.

이하에서는 EHR, mPHR 및 통합 데이터셋 각각의 의사결정트리를 분석하여 의사결정트리에서 파생된 예측 결과의 정확성에 대하여 평가한다.Hereinafter, the decision tree of each EHR, mPHR, and integrated data set is analyzed to evaluate the accuracy of the prediction results derived from the decision tree.

도 7은 EHR 또는 mPHR 대비 본원의 일 실시예에 따른 통합 데이터셋 기반의 질환 예측 모델의 정확도를 나타낸 도면이다. Figure 7 illustrates the accuracy of an integrated data set based disease prediction model in accordance with one embodiment of the present disclosure versus EHR or mPHR.

도 7을 참조하면, 질환 예측 모델의 정확도를 계산하기 위해 n과 p에 대한 값이 다음과 같이 설정될 수 있다. n은 실제 값이 '아니오'일 때 질환 예측 모델에 의한 예측 값이 '아니오'로 예측된 데이터 요소의 수를 나타낸다. p는 실제 값이 '예'일 때 질환 예측 모델에 의한 예측 값이 '예'로 예측된 데이터 요소의 수를 나타낸다. 질환 예측 모델(즉, 의사결정트리)의 정확도는 "(n + p)/(데이터셋의 총 레코드 수)"에 기초하여 산출될 수 있다.Referring to FIG. 7, the values for n and p can be set as follows to calculate the accuracy of the disease prediction model. n represents the number of data elements predicted by the disease prediction model to be 'No' when the actual value is 'No'. p represents the number of data elements predicted by the disease prediction model to be 'Yes' when the actual value is 'Yes'. The accuracy of the disease prediction model (i. E., The decision tree) can be calculated based on "(n + p) / (total number of records in the data set) ".

정확도 산출 결과, 본원의 일 실시예에 따른 통합 데이터셋 기반의 질환 예측 모델(integrated dataset)의 정확도(accuracy)는 0.82로 나타났다. 또한, EHR을 포함하는 데이터셋 기반의 질환 예측 모델의 정확도는 0.79로 나타났다. 또한 mPHR을 포함하는 데이터셋 기반의 질환 예측 모델의 정확도는 0.78로 나타났다.As a result of the accuracy calculation, the accuracy of the integrated data set based on the integrated data set according to one embodiment of the present invention was 0.82. In addition, the accuracy of the disease-predicting model based on datasets including EHR was 0.79. In addition, the accuracy of the data-based disease prediction model including mPHR was 0.78.

이에 따르면, 본원의 통합 데이터셋 기반의 질환 예측 모델은 기존의 EHR 또는 mPHR 기반의 질환 예측 모델에 비해 가장 높은 정확도를 가짐을 확인할 수 있다. 특히, 본원의 통합 데이터셋 기반의 질환 예측 모델은 기존의 EHR 또는 mPHR 기반의 질환 예측 모델에 비해 약 3 % 및 4 % 더 높은 정확도를 보임을 확인할 수 있다. 즉, 본원의 통합 데이터셋 기반의 질환 예측 모델에 의하면 EHR 또는 mPHR 대비 사용자의 질병을 예측하는데 더 높은 정확성을 나타냄을 확인할 수 있다.According to this, the integrated data set based disease prediction model of the present invention has the highest accuracy compared to the conventional EHR or mPHR based disease prediction model. In particular, our integrated data set-based disease prediction model shows approximately 3% and 4% higher accuracy than the existing EHR or mPHR-based disease prediction models. In other words, according to the disease prediction model based on the integrated data set of the present invention, it can be confirmed that the accuracy of prediction of the user's disease is higher than that of EHR or mPHR.

이러한 본원은 보다 효과적이고 정확한 의료 서비스의 제공이 이루어지도록 하는 통합 데이터셋을 제공할 수 있다. 즉, 본원은 통합 데이터셋 기반으로 의미 있는 정보의 추출이 이루어지도록 하여 보다 양질의 의료 서비스를 제공할 수 있다.The present invention can provide an integrated data set to enable more effective and accurate delivery of medical services. That is, the present invention can extract meaningful information based on the integrated data set, thereby providing a better quality medical service.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다. Hereinafter, the operation flow of the present invention will be briefly described based on the details described above.

도 8는 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 방법에 대한 동작 흐름도이다.8 is a flowchart illustrating a method of generating a data set for providing a healthcare service according to an embodiment of the present invention.

도 8에 도시된 데이터셋 생성 방법은 앞서 설명된 데이터셋 생성 장치(30)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 데이터셋 생성 장치(30)에 대하여 설명된 내용은 데이터셋 생성 방법에 대한 설명에도 동일하게 적용될 수 있다.The data set generating method shown in FIG. 8 can be performed by the data set generating apparatus 30 described above. Therefore, even if omitted below, the contents described for the data set generating apparatus 30 can be similarly applied to the description of the data set generating method.

도 8을 참조하면, 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 방법은 단계S11에서 전자건강기록(EHR)을 포함하는 제1 데이터셋 및 모바일 기기를 통해 획득되는 모바일 기반 개인건강기록(mPHR)을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행할 수 있다.Referring to FIG. 8, a method of generating a data set for providing a healthcare service according to an embodiment of the present invention includes generating a first data set including an electronic health record (EHR) and a mobile- Processing may be performed on the data contained in each of the second data sets including the health record (mPHR).

또한, 단계S11에서는, 제1 데이터셋 및 상기 제2 데이터셋 각각에 포함된 데이터에 대하여, 누락 값 또는 이상치가 존재하는 것으로 판단되는 경우, 누락 값 또는 이상치가 속한 데이터셋 내에서 상기 누락 값 또는 이상치에 대응하는 속성과 동일 속성에 속하는 속성 값들의 평균값으로 대체하는 전처리를 수행할 수 있다.If it is determined in step S11 that there is a missing value or an ideal value for the data included in each of the first data set and the second data set, the missing value or the ideal value in the data set to which the missing value or the ideal value belongs It is possible to perform preprocessing to replace the attribute corresponding to the outlier with the average value of the attribute values belonging to the same attribute.

다음으로, 단계S12에서는 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터가 통합된 통합 데이터셋을 생성할 수 있다.Next, in step S12, the data of the first data set, which is pre-processed through the matching between the data of the preprocessed first data set and the data of the preprocessed second data set, and the data of the preprocessed second data set, Can be generated.

또한, 단계S12에서는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성 간의 동일 유무를 고려한 매칭을 통해 통합 데이터셋을 생성할 수 있다.In addition, in step S12, the integrated data set can be generated through matching based on whether the data attribute of the first data set and the data attribute of the second data set are the same or not.

또한, 단계S12에서는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일하지 않는 것으로 판단되는 경우, 제2 데이터셋의 데이터 속성을 제1 데이터셋의 데이터 속성과 결합시킬 수 있다. 또한, 단계S12에서는 제1 데이터셋의 데이터 속성과 제2 데이터셋의 데이터 속성이 동일한 것으로 판단되는 경우, 제1 데이터셋의 데이터 속성을 제2 데이터셋의 데이터 속성으로 덮어씌울 수 있다.If it is determined in step S12 that the data attribute of the first data set is not the same as the data attribute of the second data set, the data attribute of the second data set may be combined with the data attribute of the first data set. If it is determined in step S12 that the data attribute of the first data set is identical to the data attribute of the second data set, the data attribute of the first data set may be overwritten with the data attribute of the second data set.

또한, 본원의 일 실시예에 따른 헬스케어 서비스 제공을 위한 데이터셋 생성 방법은 단계S11 이전에, 제1 데이터셋 및 제2 데이터셋을 생성하는 단계를 포함할 수 있다.In addition, the method of generating a data set for providing healthcare service according to an embodiment of the present invention may include a step of generating a first data set and a second data set prior to step S11.

이때, 제1 데이터셋 및 제2 데이터셋을 생성하는 단계에서는, UCI(University of California-Irvine) 기계 학습 저장소(Machine Learning Repository)에서 제공하는 데이터로서 서로 다른 장소를 갖는 복수의 데이터셋에 속한 데이터를 통합하여 제1 데이터셋을 생성할 수 있다. 또한, 제1 데이터셋 및 제2 데이터셋을 생성하는 단계에서는 모바일 기기를 통해 획득되는 데이터의 속성이 제1 데이터셋의 속성과 매치되도록 제2 데이터셋을 생성할 수 있다.At this time, in the step of generating the first data set and the second data set, data belonging to a plurality of data sets having different places as data provided by the University of California-Irvine (UCI) Machine Learning Repository To generate a first data set. Also, in the step of generating the first data set and the second data set, the second data set may be generated such that the attributes of the data acquired through the mobile device match attributes of the first data set.

상술한 설명에서, 단계 S11 내지 S12는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S11 through S12 may be further divided into further steps or combined into fewer steps, according to an embodiment of the present invention. Also, some of the steps may be omitted as necessary, and the order between the steps may be changed.

도 9는 본원의 일 실시예에 따른 질환 예측 방법에 대한 동작 흐름도이다.9 is a flowchart illustrating an operation of the disease prediction method according to an embodiment of the present invention.

도 9에 도시된 질환 예측 방법은 앞서 설명된 질환 예측 장치(70)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 질환 예측 장치(70)에 대하여 설명된 내용은 질환 예측 방법에 대한 설명에도 동일하게 적용될 수 있다.The disease prediction method shown in FIG. 9 can be performed by the disease prediction apparatus 70 described above. Therefore, even if omitted below, the contents described for the disease predicting device 70 can be similarly applied to the description of the disease predicting method.

도 9를 참조하면, 본원의 일 실시예에 따른 질환 예측 방법은 단계S21에서 예측 대상자의 건강 데이터를 수신할 수 있다.Referring to FIG. 9, the disease prediction method according to an embodiment of the present invention can receive the health data of the predicted subject in step S21.

다음으로, 단계S22에서는 전자건강기록(EHR) 및 모바일 기기를 통해 획득된 모바일 기반의 개인건강기록(mPHR)에 기초하여 생성된 통합 데이터셋을 이용하여 질환 예측 모델을 생성할 수 있다.Next, in step S22, the disease prediction model can be generated using the integrated data set generated based on the electronic health record (EHR) and the mobile-based personal health record (mPHR) obtained through the mobile device.

또한, 단계S22에서는 건강 데이터가 수신된 이후에 예측 대상자의 특성(상태) 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 질환 예측 모델을 생성할 수 있다.In addition, in step S22, after the health data is received, the disease prediction model can be generated in consideration of at least one of the characteristics (state) of the subject to be predicted and the type of disease to be predicted.

또한, 단계S22에서는 통합 데이터셋에 포함된 데이터를 기반으로 복수 사용자의 특성이 고려된 복수의 질환 예측 모델을 생성하고, 건강 데이터가 수신된 이후에 예측 대상자의 특성(상태) 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 복수의 질환 예측 모델 중 예측 대상자에 최적화된 최적 질환 예측 모델을 선택할 수 있다. 이후, 단계S23에서는 최적 질환 예측 모델에 기초하여 예측 대상자에 대한 질환 가능성을 예측할 수 있다.In addition, in step S22, a plurality of disease prediction models in which characteristics of a plurality of users are taken into account based on the data included in the integrated data set are generated, and after the health data is received, the characteristics (state) The optimal disease prediction model optimized for the prediction subject can be selected from among a plurality of disease prediction models. Thereafter, in step S23, the possibility of disease to the predicted subject can be predicted based on the optimal disease prediction model.

또한, 단계S22에서의 질환 예측 모델은, 통합 데이터셋에 포함된 데이터의 복수의 속성과 관련된 복수의 규칙을 포함할 수 있다. 또한, 복수의 규칙의 조합 수 및 조합 순서는 예측 대상자의 특성 및 예측하고자 하는 질환 유형 중 적어도 하나를 고려하여 결정될 수 있다.In addition, the disease prediction model in step S22 may include a plurality of rules associated with a plurality of attributes of data included in the integrated data set. Further, the number of combinations of the plurality of rules and the order of combination may be determined in consideration of at least one of the characteristics of the subject to be predicted and the type of disease to be predicted.

또한, 질환 예측 모델은, 통합 데이터셋에 포함된 데이터에 대하여 의사결정트리(decision tree)를 적용함으로써 생성될 수 있다.The disease prediction model can also be generated by applying a decision tree to the data contained in the integrated data set.

또한, 단계S22에서의 통합 데이터셋은, 전자건강기록을 포함하는 제1 데이터셋 및 모바일 기기를 통해 획득되는 개인건강기록을 포함하는 제2 데이터셋 각각에 포함된 데이터에 대한 전처리를 수행하고, 전처리된 제1 데이터셋의 데이터 및 전처리된 제2 데이터셋의 데이터 간에 매칭을 통해 생성될 수 있다.Further, the unified data set at step S22 performs pre-processing on the data included in each of the second data set including the first data set including the electronic health record and the personal health record obtained through the mobile device, May be generated through matching between the data of the preprocessed first data set and the data of the preprocessed second data set.

다음으로, 단계S23에서는 질환 예측 모델에 기초하여 건강 데이터에 대응하는 예측 대상자에 대한 질환 가능성을 예측할 수 있다.Next, in step S23, based on the disease prediction model, the disease probability for the predicted subject corresponding to the health data can be predicted.

상술한 설명에서, 단계 S21 내지 S23은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S21 to S23 may be further divided into additional steps or combined into fewer steps, according to embodiments of the present disclosure. Also, some of the steps may be omitted as necessary, and the order between the steps may be changed.

본원의 일 실시 예에 따른 데이터셋 생성 방법 및 질환 예측 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The data set generation method and the disease prediction method according to an exemplary embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those of ordinary skill in the art that the foregoing description of the embodiments is for illustrative purposes and that those skilled in the art can easily modify the invention without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

100: 데이터셋 생성 시스템
30: 데이터셋 생성 장치
31: 데이터셋 생성부
32: 전처리부
33: 매칭부
200: 질환 예측 시스템
70: 질환 예측 장치
71: 수신부
72: 생성부
73: 예측부100: Dataset generation system
30: Data set generating device
31: Data set generation unit
32:
33:
200: disease prediction system
70: Disease prediction device
71: Receiver
72:
73:

Claims

In a disease predicting method,
Receiving health data of a predicted subject;
Generating a disease prediction model using an integrated data set generated based on electronic health records (EHR) and mobile personal health records (mPHR) obtained via a mobile device; And
Predicting a disease potential for a predictive subject corresponding to the health data based on the disease predictive model,
Lt; / RTI >

The method according to claim 1,
Wherein the generating comprises:
Wherein the disease prediction model is generated in consideration of at least one of the characteristics of the predictive subject and the type of disease to be predicted after the health data is received.

The method according to claim 1,
Wherein the generating comprises:
Generating a plurality of disease prediction models in which characteristics of a plurality of users are taken into consideration based on data included in the integrated data set and generating at least one of the characteristics of the predicted subject and the type of disease to be predicted after the health data is received And selecting an optimal disease prediction model optimized for the predicted subject among the plurality of disease prediction models,
Wherein the predicting comprises:
And predicting a disease possibility for the predicted subject based on the optimal disease prediction model.

The method according to claim 1,
Wherein the disease prediction model includes a plurality of rules associated with a plurality of attributes of data contained in the aggregate data set,
Wherein the number of combinations of the plurality of rules and the order of combination are determined in consideration of at least one of the characteristics of the predictive subject and the type of disease to be predicted.

The method according to claim 1,
The disease prediction model includes:
And generating a decision tree for the data contained in the integrated data set.

The method according to claim 1,
The integrated data set includes:
Processing the data contained in each of the first data set including the electronic health record and the second data set including the personal health record obtained through the mobile device, Is generated through matching between data in the preprocessed second data set.

A disease predicting device comprising:
A receiving unit for receiving health data of a predicted subject;
A generating unit for generating a disease prediction model using an integrated data set generated based on electronic health records (EHR) and mobile personal health records (mPHR) obtained through a mobile device; And
A predictor for predicting a disease probability of a predictive subject corresponding to the health data based on the disease predictive model,
And a disease prediction device.

In a disease prediction system,
A first terminal device for providing health data of a predicted subject; And
Receiving the health data from the first terminal device and generating an integrated data set based on electronic health records (EHR) and mobile personal health records (mPHR) obtained via the mobile device A disease prediction device that generates a disease prediction model by using the disease prediction model, predicts a disease probability of a prediction subject corresponding to the health data based on the disease prediction model,
A disease prediction system.

A computer-readable recording medium on which a program for executing the method of any one of claims 1 to 6 is recorded.