KR102421172B1

KR102421172B1 - Smart Healthcare Monitoring System and Method for Heart Disease Prediction Based On Ensemble Deep Learning and Feature Fusion

Info

Publication number: KR102421172B1
Application number: KR1020200027108A
Authority: KR
Inventors: 곽경섭; 알리펄만
Original assignee: 인하대학교 산학협력단
Priority date: 2020-03-04
Filing date: 2020-03-04
Publication date: 2022-07-14
Also published as: KR20210112041A

Abstract

앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 방법 및 시스템이 제시된다. 본 발명에서 제안하는 앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 방법은 심장 질환자에 관한 데이터를 웨어러블 센서 측정 및 전자 의료 테스트를 통해 수집하는 단계, FRF(Framingham Risk Functions)를 EMR(Electronic Medical Record)에서 추출하는 단계, 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성하는 단계, 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하는 단계 및 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측하도록 훈련하는 단계를 포함한다. A smart healthcare monitoring method and system for heart disease prediction based on ensemble deep learning and shape fusion are presented. The smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion proposed in the present invention includes the steps of collecting data on heart disease patients through wearable sensor measurement and electronic medical testing, and FRF (Framingham Risk Functions) with EMR (Framingham Risk Functions) Electronic Medical Record), combining data collected through FRF and wearable sensor measurements using a shape fusion approach to generate medical data about heart disease, selecting shapes based on information acquisition techniques, and conditionally Calculating shape weights based on probability and training to predict heart disease of a patient through an ensemble deep learning classifier using shape weights based on a shape selected based on an information acquisition technique and a conditional probability.

Description

Smart Healthcare Monitoring System and Method for Heart Disease Prediction Based On Ensemble Deep Learning and Feature Fusion

본 발명은 앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 방법 및 시스템에 관한 것이다. The present invention relates to a smart healthcare monitoring method and system for heart disease prediction based on ensemble deep learning and shape fusion.

심장병의 정확한 예측은 심장마비가 발생하기 전에 심장병 환자를 효율적으로 치료하는 데 필수적이다. 이 목표는 심장병에 대한 풍부한 의료 데이터를 가진 최적의 기계 학습 모델을 사용하여 달성할 수 있다. 최근 심장병을 예측하고 진단하기 위해 기계 학습에 기초한 다양한 시스템이 제시되고 있다. 그러나 이러한 시스템은 심장병 예측에 다른 데이터 소스를 사용할 수 있는 스마트 프레임워크가 없기 때문에 고차원 데이터 세트를 처리할 수 없다. 또한 기존 시스템은 기존의 기법을 활용하여 데이터 세트에서 형상을 선택하고 그 중요성에 기초하여 해당 형상에 대한 일반적인 가중치를 계산한다. 이러한 방법들은 심장병 진단의 성능 향상에 실패하기도 했다.Accurate prediction of heart disease is essential to effectively treating patients with heart disease before a heart attack occurs. This goal can be achieved using an optimal machine learning model with rich medical data on heart disease. Recently, various systems based on machine learning have been proposed to predict and diagnose heart disease. However, these systems cannot handle high-dimensional data sets because there is no smart framework that can use other data sources for heart disease prediction. Existing systems also utilize existing techniques to select a shape from a data set and calculate a general weight for that shape based on its importance. These methods have also failed to improve the performance of heart disease diagnosis.

심장병의 자동예측은 현실 세계에서 가장 필수적이고 어려운 건강문제 중 하나이다. 심장병은 혈관의 기능에 영향을 미치며, 관상동맥 감염을 일으켜 환자, 특히 성인이나 노인을 약화시킨다. 세계보건기구(WHO)는 심혈관 질환으로 인해 매년 1800만 명 이상의 사망자가 발생하는 것을 알아냈다. 게다가, 미국은 하루에 심장병 치료에 10억 달러를 쓴다. 미국에서 사망의 주요 원인은 뇌졸중, 심장마비, 고혈압과 같은 심장병이다. 따라서 심장병의 조기 예측은 심장마비나 뇌졸중이 발생하기 전에 심장병 환자를 효과적으로 치료하는 데 매우 중요하다. Automatic prediction of heart disease is one of the most essential and difficult health problems in the real world. Heart disease affects the functioning of blood vessels and causes coronary artery infections, weakening patients, especially adults and the elderly. The World Health Organization (WHO) has found that cardiovascular disease causes more than 18 million deaths each year. In addition, the United States spends $1 billion per day treating heart disease. The leading cause of death in the United States is heart disease, such as stroke, heart attack, and high blood pressure. Therefore, early prediction of heart disease is very important for effectively treating heart disease patients before a heart attack or stroke occurs.

심혈관 질환은 의료 테스트를 수행하고 착용 가능한 센서를 사용하여 식별할 수 있다. 그러나 의사들이 신속하고 정확하게 환자를 진단하려고 하기 때문에 전자 의료 테스트에서 심장병에 대한 귀중한 위험 요소를 추출하는 것은 어렵다. 이러한 전자 의료 기록(EMR)은 구조화되지 않았으며 일일 의료 테스트로 인해 지속적으로 크기가 증가하고 있다. 현재 웨어러블 센서는 심장병을 감지하기 위해 내부와 외부로 환자의 신체를 지속적으로 감시하는 데도 사용된다. 그러나 심장병 예측을 위한 웨어러블 센서 데이터는 결측값, 노이즈 등의 신호 아티팩트에 의해 손상되어 시스템 성능이 저하되고 부정확한 결과가 발생한다. 우선, 웨어러블 센서와 EMR을 함께 사용하는 것은 심장 질환자를 모니터링할 때 중요하고 어려운 작업이다. 둘째로, 데이터에서 관련되고 의미 있는 특징을 추출하는 것은 심장병 예측을 위한 도전 과제이다. 따라서 센서 데이터와 EMR 모두에서 추출된 정보를 자동으로 융합할 수 있고, 추출된 데이터를 분석하여 심장병의 숨겨진 증상을 식별하고 심장마비가 발생하기 전에 심장병을 예측할 수 있는 지능형 시스템이 필요하다.Cardiovascular disease can be identified by performing medical tests and using wearable sensors. However, extracting valuable risk factors for heart disease from electronic medical tests is difficult as doctors seek to diagnose patients quickly and accurately. These electronic medical records (EMRs) are unstructured and continue to grow in size due to daily medical testing. Currently, wearable sensors are also used to continuously monitor a patient's body, both internally and externally, to detect heart disease. However, wearable sensor data for predicting heart disease is damaged by signal artifacts such as missing values and noise, resulting in poor system performance and inaccurate results. First of all, the use of wearable sensors and EMR together is an important and difficult task when monitoring patients with heart disease. Second, extracting relevant and meaningful features from data is a challenge for predicting heart disease. Therefore, there is a need for an intelligent system that can automatically fuse information extracted from both sensor data and EMR, analyze the extracted data to identify hidden symptoms of heart disease, and predict heart disease before a heart attack occurs.

현재, 데이터 마이닝 기법과 하이브리드 모델을 활용하여 심혈관 질환의 예측과 진단을 실시하는 몇 가지 시스템이 제안되었다. 데이터 마이닝 기법은 구조화되지 않은 텍스트 데이터에서 위험 요인을 추출한다. 또한, 하이브리드 모델은 어떤 하나의 개별적인 방법보다 함께 더 잘 작용하는 두 가지 다른 방법의 통합이다. 이 하이브리드 모델들은 기본적으로 두 가지 주요 단계를 포함하고 있다. 첫 번째 단계에서는 형상의 부분 집합을 선택하거나 형상 가중치를 식별하기 위해 형상 선택 또는 형상 가중치 접근법을 적용한다. 두 번째 단계에서는 형상의 서브셋 또는 가중치를 심장병을 예측하는 분류기에 입력으로 활용한다. 그러나, 심장병에 대하여 수집된 데이터 세트에는 많은 중복 및 무관한 특징과 함께 관련 형상이 포함되어 있다. 중복된 특징과 무관한 특징 모두 대상 클래스의 정의에 대한 혼란과 노이즈를 유발한다. 이러한 형상의 취급은 시간이 많이 소요될 뿐만 아니라 분류의 정확성에도 영향을 미친다. 게다가, 기존의 심장병 진단 시스템은 일반적인 형상 가중법에 근거한다. 이러한 방법은 모든 클래스에 대해 각 형상에 동일한 가중치를 할당한다. 그러나 그것들은 불확실한 조합 운용을 이용하는데, 이는 구별용 형상의 중요성을 설정할 수 있고, 이론적 뒷받침이 부족하여 평균 제곱 오차(MSE)를 증가시키며 예측 모델의 정확도를 감소시킨다. 따라서 분류 모델을 적용하기 전에 불필요한 형상을 제거하고 형상에 특정 가중치를 부여할 필요가 있다. Currently, several systems for predicting and diagnosing cardiovascular disease using data mining techniques and hybrid models have been proposed. Data mining techniques extract risk factors from unstructured text data. Also, a hybrid model is the integration of two different methods that work better together than any one separate method. These hybrid models basically involve two main steps. In the first step, a shape selection or shape weighting approach is applied to select a subset of shapes or to identify shape weights. In the second step, a subset of shapes or weights are used as input to a classifier that predicts heart disease. However, the data sets collected for heart disease contain relevant features along with many redundant and irrelevant features. Both duplicate and unrelated features cause confusion and noise in the definition of the target class. Handling of these shapes is not only time consuming, but also affects the accuracy of classification. In addition, the existing heart disease diagnosis system is based on a general shape weighting method. This method assigns equal weights to each shape for all classes. However, they use ambiguous combinatorial operations, which can establish the significance of the distinguishing features, and the lack of theoretical support increases the mean square error (MSE) and reduces the accuracy of the predictive model. Therefore, it is necessary to remove unnecessary shapes and give specific weights to the shapes before applying the classification model.

웨어러블 센서와 EMR은 심장병 환자를 위한 헬스케어 모니터링 시스템에서 중요한 역할을 한다. 그러나 센서 데이터와 EMR에서 형상을 추출한 후 이를 구조화된 데이터로 변환하기 위해 결합하는 것은 어려운 작업이다. 또한, 구조화된 데이터에서 형상을 선택한 다음 그에 가치 있는 가중치를 할당하는 것은 기계 학습(ML) 기반 시스템의 또 다른 과제이다. 따라서 우선 웨어러블 센서 기반 심장병 진단 시스템을 살펴본 다음 텍스트 데이터에서 정보를 추출하는 데 초점을 맞춘 후 형상 융합한다. 또한 심장병 예측 영역에서 건강관리 데이터의 형상 중요성의 식별에 대한 간략한 검토를 제시한다.Wearable sensors and EMR play an important role in healthcare monitoring systems for patients with heart disease. However, extracting shapes from sensor data and EMR and combining them to transform them into structured data is a difficult task. Moreover, selecting shapes from structured data and then assigning them valuable weights is another challenge for machine learning (ML)-based systems. Therefore, we first look at a wearable sensor-based heart disease diagnosis system, and then focus on extracting information from text data and then shape fusion. We also present a brief review of the identification of geometric significance of healthcare data in the area of predicting heart disease.

최근 몇 년 동안 심장병 예측의 과정을 개선하기 위해 웨어러블 센서를 사용하는 다양한 시스템이 제안되었다. Al-Makhadmeh와 Tolba는 심장마비 전후에 심장 질환자에 대한 세부사항을 수집하는 웨어러블 의료기기 기반 시스템을 제시했다. 그들은 수집된 데이터를 의료 시스템에 전송한 다음, 중요한 형상 추출과 정확한 분류를 위해 형상 추출 기법과 딥러닝 모델을 활용했다. 그러나 이 시스템은 효율적인 형상 추출 및 형상 가중 접근에 한계가 있다. 또 훈련에 23개 속성을 사용, 시스템 복잡성과 차원성을 높인다. 사물 인터넷(IoT) 장치와 딥러닝 모델을 사용하여 심장 질환자를 자동으로 치료하기 위해 HealthFog라는 시스템이 제공되었다. IoT 기기에서 나오는 심장 질환자 데이터를 능숙하게 관리하는 것이 이 시스템의 주요 목적이다. 더욱이, 의사결정 지원 시스템에 대한 또 다른 새로운 프레임워크가 환자의 질병 감지를 위해 제시되었다. 이 시스템은 웨어러블 의료기기에서 수집된 의료 데이터와 데이터를 통합한다. 또한, 질병 진단에는 딥러닝과 함께 다계층 구조 접근법을 활용한다. IoT 기기를 기반으로 한 하이브리드 추천 시스템이 제시되어 심장병을 진단하였다. 이 시스템은 심장 질환자의 상태에 따라 식이요법 계획과 신체 활동을 권장한다. 그러나 이 권고는 정확한 권고를 위해 환자에 대한 보다 민감한 정보를 추출하기 위한 의미 정보가 필요한 간단한 규칙에 기초한다. fuzzy 신경 분류기를 사용하는 IoT 기반 질병 예측 시스템이 제시되었다. 이 시스템은 심각한 질병 진단을 위한 모바일 헬스케어 모니터링 시스템의 새로운 프레임워크를 제공한다. 또한, 웨어러블 기기를 사용하여 심장 질환자의 데이터를 저장하고 처리하기 위해 ML 모델이 있는 3계층 프레임워크가 제시된다. 첫 번째 계층은 센서로부터 생리학적 데이터를 수집하고, 두 번째 계층은 의료 데이터를 클라우드에 저장하며, 세 번째 계층은 로지스틱 회귀 분석을 사용하여 심장병을 예측한다. 그러나 이 시스템은 심장 질환자의 건강상태를 정확하게 파악하기에는 역부족인 7가지 형상을 활용한 실험을 수행한다. 또한 데이터 사전 처리 단계가 누락되어 있어 ML 분류기에 대한 데이터 준비를 이해하기 어렵다.In recent years, various systems using wearable sensors have been proposed to improve the process of predicting heart disease. Al-Makhadmeh and Tolba presented a wearable medical device-based system that collects details about patients with heart disease before and after a heart attack. They sent the collected data to a medical system, then utilized shape extraction techniques and deep learning models for important shape extraction and accurate classification. However, this system has limitations in efficient shape extraction and shape weighting approach. In addition, 23 attributes are used for training, increasing system complexity and dimensionality. A system called HealthFog was provided to automatically treat people with heart disease using Internet of Things (IoT) devices and deep learning models. The main purpose of this system is to competently manage heart disease data from IoT devices. Moreover, another novel framework for decision support system has been proposed for disease detection in patients. The system integrates medical data and data collected from wearable medical devices. In addition, it utilizes a multi-hierarchical approach in conjunction with deep learning for disease diagnosis. A hybrid recommendation system based on IoT devices was presented to diagnose heart disease. The system recommends a diet plan and physical activity according to the condition of a person with heart disease. However, this recommendation is based on a simple rule that requires semantic information to extract more sensitive information about the patient for an accurate recommendation. An IoT-based disease prediction system using a fuzzy neural classifier is presented. This system provides a new framework for mHealthcare monitoring systems for the diagnosis of serious diseases. In addition, a three-layer framework with ML models is presented for storing and processing data from cardiac patients using wearable devices. The first layer collects physiological data from sensors, the second layer stores medical data in the cloud, and the third layer predicts heart disease using logistic regression analysis. However, this system conducts an experiment using seven shapes, which is insufficient to accurately determine the health status of a person with heart disease. Also, the data pre-processing step is missing, making the data preparation for the ML classifier difficult to understand.

의료 텍스트 데이터에서 가치 있는 형상의 추출과 센서 데이터의 융합은 심장병 진단 시스템의 또 다른 핵심 문제이다. 형상 추출 기법은 의료 빅데이터로부터 의미 있는 정보를 검색한다. 데이터 융합의 절차는 관련되고 가치 있는 데이터 생성을 위해 서로 다른 데이터 소스를 병합한다. 최근 의료 텍스트 데이터에서 기능을 추출하고 센서 데이터를 다른 데이터와 결합하는 다양한 모델이 제안되었다. 심장병을 예측하기 위해 ML 분류기에 기초한 실시간 시스템이 제공되었다. 이 시스템은 중요한 형상의 헬스케어 데이터 세트를 추출하는 형상 추출(단일화 및 완화)로 두 가지 접근방식을 채택한다. 심장병 진단을 위한 mine EMR에 새롭고 강력한 구조가 제시되었다. 이 시스템은 형상 식별을 위해 단어 임베딩 모델을 사용하고 심부전 예측을 위해 장단기 메모리(LSTM)를 사용한다. 게다가, 또 다른 시스템은 병원의 심장 질환자 사망률을 예측하기 위해 초음파 검사 결과를 사용했다. 이 시스템은 텍스트 마이닝 기법을 사용하여 형상을 추출한 다음 사망률 예측에 대한 딥러닝 모델을 적용한다. 새로운 텍스트 마이닝 접근방식은 EMR에서 정보를 추출하기 위해 제시된다. 이 시스템의 저자들은 정보 추출과 의사결정 업무를 자동화하는 규칙 기반의 엔진을 이용한다. 그러나 많은 의료 기록에는 구조화되지 않은 데이터가 들어 있어 규칙 기반 엔진으로는 이러한 데이터를 처리하기 어렵다. 또한 심장병 위험 점수는 환자의 구조화되지 않은 EMR을 사용하여 식별된다. 이 시스템에서 저자들은 텍스트 마이닝 기법을 사용하여 구조화되지 않은 EMR에서 심장에 중요한 요인을 추출하고 당뇨병 환자의 심장병 위험 점수를 계산한다. The extraction of valuable features from medical text data and the fusion of sensor data is another key issue for heart disease diagnosis systems. The shape extraction technique retrieves meaningful information from medical big data. The process of data fusion merges disparate data sources to create related and valuable data. Recently, various models have been proposed that extract features from medical text data and combine sensor data with other data. A real-time system based on an ML classifier was provided to predict heart disease. The system adopts two approaches as shape extraction (unification and mitigation) to extract healthcare data sets of important shapes. A new and powerful structure was proposed in mine EMR for the diagnosis of heart disease. The system uses word embedding models for shape identification and long-term short-term memory (LSTM) for heart failure prediction. In addition, another system used the results of ultrasound scans to predict mortality rates for heart disease patients in hospitals. The system uses text mining techniques to extract shapes and then applies a deep learning model for mortality prediction. A new text mining approach is presented to extract information from EMR. The authors of this system use a rules-based engine that automates information extraction and decision-making tasks. However, many medical records contain unstructured data, making it difficult for rule-based engines to process such data. Heart disease risk scores are also identified using the patient's unstructured EMR. In this system, the authors use text mining techniques to extract heart-critical factors from unstructured EMR and calculate heart disease risk scores for diabetic patients.

웨어러블 센서는 음악 권장사항에 대한 사용자의 감정을 감지하기 위해 사용된다. 이 시스템에서 저자들은 각 센서에서 독립적으로 형상을 추출한 후 추출한 형상을 감정탐지를 위해 결합하는 형상 수준 융합을 활용했다. 게다가, 다른 시스템에서 저자들은 감정 감지를 위해 환경 센서와 웨어러블 바디 센서를 모두 이용했다. 그들은 환경이 인간의 건강에 미치는 영향을 조사하기 위해 세 단계의 융합(데이터, 형상, 결정)을 모두 적용했다. 질병 위험을 예측하기 위해 반복 콘볼루션 신경망(RCNN) 기반 시스템이 제시되었다. 이 시스템은 환자의 구조화 및 비구조화 데이터에서 형상을 추출하고, RCNN의 정확도를 향상시키기 위해 깊이 신뢰가 가는 네트워크를 사용하여 형상을 융합한다. The wearable sensor is used to detect the user's feelings about music recommendations. In this system, the authors utilized shape-level fusion, which independently extracts shapes from each sensor and then combines the extracted shapes for emotion detection. Moreover, in another system, the authors used both environmental sensors and wearable body sensors for emotion detection. They applied all three levels of fusion (data, shape, and decision) to investigate the impact of the environment on human health. An iterative convolutional neural network (RCNN)-based system has been presented to predict disease risk. This system extracts features from structured and unstructured data of the patient and fuses them using a deeply reliable network to improve the accuracy of RCNN.

본 발명이 이루고자 하는 기술적 과제는 딥러닝과 기능융합을 이용한 심장질환 예측을 위한 새로운 스마트 헬스케어 모니터링 방법 및 시스템을 제공하는데 있다. 형상 융합 방법을 통해 센서 데이터와 전자 의료 기록에서 추출된 기능을 결합하여 의료 데이터를 생성하고, 정보 획득 기법을 통해 무관하고 중복된 형상을 없애고 중요한 형상을 선택하며, 이는 계산 부담을 줄이고 시스템 성능을 향상시킬 수 있다. 또한, 조건부 확률 접근방식은 각 등급에 대한 특정 형상 가중치를 계산하여 시스템 성능을 더욱 향상시킨다.The technical task to be achieved by the present invention is to provide a new smart healthcare monitoring method and system for predicting heart disease using deep learning and functional fusion. The shape fusion method combines sensor data and functions extracted from electronic medical records to generate medical data, and the information acquisition technique eliminates irrelevant and redundant shapes and selects important shapes, which reduces the computational burden and improves system performance. can be improved In addition, the conditional probabilistic approach further improves system performance by calculating specific shape weights for each class.

일 측면에 있어서, 본 발명에서 제안하는 앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 방법은 심장 질환자에 관한 데이터를 웨어러블 센서 측정 및 전자 의료 테스트를 통해 수집하는 단계, FRF(Framingham Risk Functions)를 EMR(Electronic Medical Record)에서 추출하는 단계, 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성하는 단계, 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하는 단계 및 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측하도록 훈련하는 단계를 포함한다. In one aspect, the smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion proposed by the present invention includes collecting data on patients with heart disease through wearable sensor measurement and electronic medical test, Framingham Risk (FRF). Functions) from Electronic Medical Record (EMR), combining data collected through FRF and wearable sensor measurements using a shape fusion approach to generate medical data on heart disease, and shape based on information acquisition techniques train to predict the patient's heart disease through the ensemble deep learning classifier using the shape weights based on the shape and conditional probability selected based on the shape and conditional probability selected based on the conditional probability and calculating the shape weight based on the conditional probability. include

FRF를 EMR에서 추출하는 단계는 텍스트 마이닝 기법 및 규칙 기반 엔진을 사용하고, 형태학 및 표준형 결정(lemmatization) 알고리즘을 모든 비구조화 데이터에 적용하여 각 단어의 표준형(lemma)을 식별하며, 비구조화 텍스트를 작은 덩어리로 구분하는 토큰화를 수행하고, N-그램 접근방식을 사용하여 FRF를 추출한다. The step of extracting FRF from EMR uses a text mining technique and a rule-based engine, applies morphology and lemmatization algorithms to all unstructured data, identifies the lemma of each word, and extracts the unstructured text. We perform tokenization that separates chunks, and extracts FRFs using an N-gram approach.

정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하는 단계는 정보 획득 기법을 이용하여 분류 과제에 따라 중요성을 측정하는 형상을 선택하고, 시스템 불확실성을 측정하기 위해 엔트로피를 사용하여 주어진 두 개의 개별 변수의 선행 엔트로피와 후행 엔트로피 사이의 차이를 발견한다. The step of selecting a shape based on the information acquisition technique and calculating the shape weight based on the conditional probability selects a shape whose importance is measured according to the classification task using the information acquisition technique, and uses entropy to measure the system uncertainty. to find the difference between the leading and trailing entropy of two given individual variables.

정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하는 단계는 각 클래스에 대해 형상 중요성을 결정하고, 모든 클래스에 대한 전체 형상 유의성을 식별하기 위해 최대화 및 합계 기능을 사용하고, 확률론적 접근법을 사용하여 각 클래스에 대한 특정 형상 가중치를 계산한다. Selecting the shape based on the information acquisition technique and calculating the shape weight based on the conditional probability determines the shape significance for each class, uses the maximize and sum functions to identify the overall shape significance for all classes, and , compute a specific shape weight for each class using a probabilistic approach.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 시스템은 심장 질환자에 관한 데이터를 웨어러블 센서 측정을 통해 수집하는 게이트웨이 장치, 게이트웨이 장치를 통해 수집된 심장 질환자에 관한 데이터 및 EMR(Electronic Medical Record)에서 추출된 FRF(Framingham Risk Functions)를 저장하는 데이터 베이스, 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성하여, 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하면, 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측하는 건강 상태 예측 및 질병 진단 엔진을 포함한다. In another aspect, the smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and shape fusion proposed in the present invention collects data on patients with heart disease through a wearable sensor measurement, a gateway device, and a gateway device. A database that stores data on patients with heart disease and FRF (Framingham Risk Functions) extracted from EMR (Electronic Medical Record), using a shape fusion approach, combines data collected through FRF and wearable sensor measurements, When medical data is generated, a shape is selected based on the information acquisition technique, and shape weights are calculated based on the conditional probability, the ensemble deep learning classifier using the shape weights based on the shape selected based on the information acquisition technique and the conditional probability It includes a health condition prediction and disease diagnosis engine that predicts a patient's heart disease through

본 발명의 실시예들에 따르면 형상 융합 방법을 통해 센서 데이터와 전자 의료 기록에서 추출된 기능을 결합하여 의료 데이터를 생성하고, 정보 획득 기법을 통해 무관하고 중복된 형상을 없애고 중요한 형상을 선택하며, 이는 계산 부담을 줄이고 시스템 성능을 향상시킬 수 있다. 또한, 조건부 확률 접근방식은 각 등급에 대한 특정 형상 가중치를 계산하여 시스템 성능을 더욱 향상시킨다. 제안된 시스템은 기존 시스템보다 높은 98.5%의 정확도를 얻을 수 있다. According to embodiments of the present invention, medical data is generated by combining sensor data and functions extracted from electronic medical records through a shape fusion method, and an important shape is selected by eliminating irrelevant and redundant shapes through an information acquisition technique, This can reduce the computational burden and improve system performance. In addition, the conditional probabilistic approach further improves system performance by calculating specific shape weights for each class. The proposed system can obtain an accuracy of 98.5%, which is higher than that of the existing system.

도 1은 본 발명의 일 실시예에 따른 SHMS(Smart Healthcare Monitoring System)의 프레임워크이다.
도 2는 본 발명의 일 실시예에 따른 SHMS의 동작 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 SHMS의 작업 흐름을 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 FRF 추출 모듈을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 텍스트 데이터에서 추출한 FRF와 센서 데이터의 융합을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 개념화된 앙상블 심층 학습 모델을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시예에 따른 심장 질환자를 위한 권고안을 만들기 위하여 사용되는 온톨로지를 나타내는 도면이다. 1 is a framework of a Smart Healthcare Monitoring System (SHMS) according to an embodiment of the present invention.
2 is a flowchart illustrating a method of operating SHMS according to an embodiment of the present invention.
3 is a diagram illustrating a workflow of SHMS according to an embodiment of the present invention.
4 is a view for explaining an FRF extraction module according to an embodiment of the present invention.
5 is a diagram for explaining the fusion of FRF extracted from text data and sensor data according to an embodiment of the present invention.
6 is a diagram for explaining a conceptualized ensemble deep learning model according to an embodiment of the present invention.
7 is a diagram illustrating an ontology used to make a recommendation for a person with a heart disease according to an embodiment of the present invention.

본 발명에서는 딥러닝과 형상 융합 접근법을 이용한 심장병 예측을 위한 스마트 헬스케어 모니터링 방법 및 시스템을 제안한다. 첫째, 형상 융합 방법은 센서 데이터와 전자 의료 기록에서 추출된 기능을 결합하여 귀중한 의료 데이터를 생성한다. 둘째, 정보 획득 기법은, 무관하고 중복된 형상을 없애고, 중요한 형상을 선택하는데, 이는 계산 부담을 줄이고 시스템 성능을 향상시킨다. 또한 조건부 확률 접근방식은 각 등급에 대한 특정 형상 가중치를 계산하여 시스템 성능을 더욱 향상시킨다. 마지막으로, 앙상블 딥러닝 모델은 심장병 예측을 위해 훈련된다. 제안된 시스템은 심장병 데이터로 평가되며, 형상 융합, 형상 선택 및 가중치 부여 기법에 기초한 전통적인 분류기와 비교된다. 제안된 시스템은 기존 시스템보다 높은 98.5%의 정확도를 얻는다. 이 결과는 제안하는 시스템이 다른 종래 기술의 최첨단 방법에 비해 심장병 예측에 더 효과적이라는 것을 보여준다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다. The present invention proposes a smart healthcare monitoring method and system for predicting heart disease using deep learning and shape fusion approaches. First, shape fusion methods combine sensor data and features extracted from electronic medical records to generate valuable medical data. Second, the information acquisition technique eliminates irrelevant and redundant shapes and selects important shapes, which reduces the computational burden and improves system performance. In addition, the conditional probabilistic approach further improves system performance by calculating specific shape weights for each class. Finally, an ensemble deep learning model is trained to predict heart disease. The proposed system is evaluated with heart disease data and compared with traditional classifiers based on shape fusion, shape selection and weighting techniques. The proposed system achieves an accuracy of 98.5%, which is higher than that of the existing system. These results show that the proposed system is more effective in predicting heart disease compared to other state-of-the-art methods of the prior art. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 SHMS(Smart Healthcare Monitoring System)의 프레임워크이다. 1 is a framework of a Smart Healthcare Monitoring System (SHMS) according to an embodiment of the present invention.

제안하는 앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 시스템(Smart Healthcare Monitoring System; SHMS)은 게이트웨이 장치(130), 데이터 베이스(140), 건강 상태 예측 및 질병 진단 엔진(150)을 포함한다. The proposed ensemble deep learning and shape fusion-based smart healthcare monitoring system (SHMS) for heart disease prediction is a gateway device 130, database 140, health condition prediction and disease diagnosis engine 150. include

게이트웨이 장치(130)는 심장 질환자에 관한 데이터를 웨어러블 센서 측정을 통해 수집한다. The gateway device 130 collects data on a person with a heart disease through wearable sensor measurement.

데이터 베이스(140)는 게이트웨이 장치를 통해 수집된 심장 질환자에 관한 데이터 및 EMR(Electronic Medical Record)에서 추출된 FRF(Framingham Risk Functions)를 저장한다. The database 140 stores data on the heart disease patient collected through the gateway device and Framingham Risk Functions (FRF) extracted from the Electronic Medical Record (EMR).

건강 상태 예측 및 질병 진단 엔진(150)은 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성하여, 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하면, 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측한다. The health state prediction and disease diagnosis engine 150 uses a shape fusion approach to combine data collected through FRF and wearable sensor measurement and generate medical data about heart disease, select a shape based on the information acquisition technique, If the shape weight is calculated based on the conditional probability, the patient's heart disease is predicted through the ensemble deep learning classifier using the shape weight based on the shape selected based on the information acquisition technique and the conditional probability.

건강 상태 예측 및 질병 진단 엔진(150)은 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성하는 데이터 생성부(151), 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하는 형상 선택 및 가중치 계산부(152) 및 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측하도록 훈련하는 학습부(153)를 포함한다. The health state prediction and disease diagnosis engine 150 combines data collected through FRF and wearable sensor measurement using a shape fusion approach, and a data generator 151 that generates medical data related to heart disease, based on an information acquisition technique Through the ensemble deep learning classifier using the shape selection and weight calculation unit 152 that selects a shape and calculates the shape weight based on the conditional probability It includes a learning unit 153 that trains to predict the patient's heart disease.

제안된 스마트 헬스케어 모니터링 시스템(SHMS)은 각 단계의 정보 토대를 철저히 기술하기 위해 서로 다른 계층으로 나뉜다. 그리고, 앙상블 딥러닝 모델과 온톨로지의 구조를 제시하는데, 이는 SHMS에 의해 채택되어 환자의 심장병을 예측하고 식이요법 계획과 활동을 권장한다(154). The proposed smart healthcare monitoring system (SHMS) is divided into different layers to thoroughly describe the information base of each step. Then, we present the structure of the ensemble deep learning model and ontology, which is adopted by SHMS to predict patients' heart disease and recommend diet plans and activities (154).

SHMS에는 두 개의 주요 데이터 소스가 있다. 첫 번째 소스는 무선 바디 센서 네트워크(WBSN)(110)이다. 다른 데이터 소스는 EMR(120)이다. 이 시스템은 의료 센서를 기반으로 한 WBSN을 사용하여 심전도(ECG), 전극(EEG), 심박수(EMG), 혈압(BP), 위치, 활동, 호흡률, 혈당, 산소포화도, 일일 건강 모니터링에 대한 환자의 콜레스테롤 수치 등 내외 생리학적 데이터를 수집한다. EMR은 환자 관찰 보고서, 의료 기록, 흡연 기록, 당뇨병 기록 및 상세한 임상 검사를 제공한다. 심장 질환자로부터 데이터를 감지한 후, 제안된 시스템은 관련 게이트웨이 장치로 데이터를 전송한다. 다양한 장치를 사용하여 감지된 데이터를 수집하고 전달하여 추가 처리를 할 수 있다. 이 시스템에서 생리학적 데이터는 블루투스 및 WiFi 장치를 통해 전송된다. 감지된 데이터와 EMR 모두 의료 빅데이터로 간주되는 데이터베이스에 안전하게 저장된다. SHMS의 목적은 수집된 데이터를 기반으로 환자의 질병 위험을 예측하는 것이다. 따라서, 건강 상태 예측과 질병 진단 엔진을 이용하여, 구조화 및 구조화되지 않은 수집된 데이터를 기반으로 심장병을 예측한다. 이러한 엔진은 데이터 생성부(151)를 통한 데이터 융합, 형상 선택 및 가중치 계산부(152)를 통한 사전 처리, 학습부(153)를 통한 딥러닝 기반의 심장병 예측, 온톨로지 기반 권고(154) 등 4단계로 구성된다. 첫 번째 단계에서, 구조화 데이터와 비구조화 데이터에서 추출된 형상은 제안된 융합 방식을 사용하여 융합된다. 다음 단계는 데이터 마이닝 기법을 사용하여 데이터를 미리 처리하는 장소이다. 이 단계에는 데이터 필터링, 정규화, 데이터 마이닝 기법을 사용한 가치 있는 형상 선택 및 조건부 확률을 사용한 형상 가중치가 포함된다. 세 번째 단계에서, 사전 처리된 데이터는 심장병의 최종 예측을 위해 심장병 데이터 세트에 대해 훈련된 딥러닝 분류기로 전달된다. 네 번째 단계에서, 온톨로지는 환자의 건강 상태에 따라 식이요법 계획이나 활동을 권고하는 데 사용된다. SHMS has two main data sources. The first source is a wireless body sensor network (WBSN) 110 . Another data source is EMR 120 . The system uses WBSN based medical sensors to monitor electrocardiogram (ECG), electrodes (EEG), heart rate (EMG), blood pressure (BP), location, activity, respiratory rate, blood sugar, oxygen saturation, and daily health monitoring of the patient. Collect internal and external physiological data such as cholesterol levels. EMR provides patient observation reports, medical records, smoking records, diabetes records and detailed clinical examinations. After detecting the data from the heart disease patient, the proposed system transmits the data to the relevant gateway device. A variety of devices can be used to collect and forward the sensed data for further processing. In this system, physiological data is transmitted via Bluetooth and WiFi devices. Both the detected data and the EMR are stored securely in a database that is considered medical big data. The purpose of SHMS is to predict a patient's disease risk based on the data collected. Therefore, using the health status prediction and disease diagnosis engine, heart disease is predicted based on the structured and unstructured data collected. These engines include data fusion through the data generation unit 151, shape selection and pre-processing through the weight calculation unit 152, deep learning-based heart disease prediction through the learning unit 153, ontology-based recommendations 154, etc. 4 consists of steps. In the first step, the shapes extracted from structured data and unstructured data are fused using the proposed fusion method. The next step is where data mining techniques are used to pre-process the data. This step includes data filtering, normalization, selection of valuable shapes using data mining techniques, and shape weighting using conditional probabilities. In the third step, the pre-processed data is passed to a deep learning classifier trained on the heart disease data set for the final prediction of heart disease. In the fourth step, the ontology is used to recommend a diet plan or activity according to the patient's health status.

도 2는 본 발명의 일 실시예에 따른 SHMS의 동작 방법을 설명하기 위한 흐름도이다. 2 is a flowchart illustrating a method of operating SHMS according to an embodiment of the present invention.

제안하는 앙상블 딥러닝과 형상 융합 기반 심장병 예측을 위한 스마트 헬스케어 모니터링 방법은 심장 질환자에 관한 데이터를 웨어러블 센서 측정 및 전자 의료 테스트를 통해 수집하는 단계(210), FRF(Framingham Risk Functions)를 EMR(Electronic Medical Record)에서 추출하는 단계(220), 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성하는 단계(230), 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산하는 단계(240) 및 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측하도록 훈련하는 단계(250)를 포함한다. The proposed smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion includes collecting data on heart disease patients through wearable sensor measurement and electronic medical testing (210), and FRF (Framingham Risk Functions) to EMR (Framingham Risk Functions). Extracting from Electronic Medical Record (220), combining data collected through FRF and wearable sensor measurement using a shape fusion approach and generating medical data about heart disease (230), based on the information acquisition technique To predict the patient's heart disease through the ensemble deep learning classifier using the shape weights based on the shape selected based on the shape and the conditional probability selected based on the shape selected based on the shape and the conditional probability based on the step 240 of selecting a shape and calculating the shape weight based on the conditional probability ( 240 ) and training 250 .

단계(210)에서, 심장 질환자에 관한 데이터를 웨어러블 센서 측정 및 전자 의료 테스트를 통해 수집한다. In step 210 , data on a person with a heart disease is collected through wearable sensor measurement and electronic medical testing.

단계(220)에서, FRF(Framingham Risk Functions)를 EMR(Electronic Medical Record)에서 추출한다. 이때, 텍스트 마이닝 기법 및 규칙 기반 엔진을 사용하고, 형태학 및 표준형 결정(lemmatization) 알고리즘을 모든 비구조화 데이터에 적용하여 각 단어의 표준형(lemma)을 식별하며, 비구조화 텍스트를 작은 덩어리로 구분하는 토큰화를 수행하고, N-그램 접근방식을 사용하여 FRF를 추출한다. In step 220, Framingham Risk Functions (FRF) are extracted from Electronic Medical Record (EMR). At this time, a token that uses a text mining technique and a rule-based engine, applies a morphology and a lemmatization algorithm to all unstructured data, identifies the lemma of each word, and divides the unstructured text into small chunks. , and extract the FRF using an N-gram approach.

단계(230)에서, 형상 융합 접근법을 사용하여 FRF와 웨어러블 센서 측정을 통해 수집된 데이터를 결합하고 심장병에 관한 의료 데이터를 생성한다. In step 230, a shape fusion approach is used to combine data collected through FRF and wearable sensor measurements and generate medical data regarding heart disease.

단계(240)에서, 정보 획득 기법에 기초하여 형상을 선택하고, 조건부 확률에 기초하여 형상 가중치 계산한다. 정보 획득 기법을 이용하여 분류 과제에 따라 중요성을 측정하는 형상을 선택하고, 시스템 불확실성을 측정하기 위해 엔트로피를 사용하여 주어진 두 개의 개별 변수의 선행 엔트로피와 후행 엔트로피 사이의 차이를 발견한다. 각 클래스에 대해 형상 중요성을 결정하고, 모든 클래스에 대한 전체 형상 유의성을 식별하기 위해 최대화 및 합계 기능을 사용하고, 확률론적 접근법을 사용하여 각 클래스에 대한 특정 형상 가중치를 계산한다. In step 240, a shape is selected based on the information acquisition technique, and shape weights are calculated based on the conditional probability. The information acquisition technique is used to select a shape whose significance is measured according to the classification task, and to use entropy to measure the system uncertainty, find the difference between the leading entropy and the trailing entropy of two given individual variables. Determine shape significance for each class, use maximize and sum functions to identify overall shape significance for all classes, and calculate specific shape weights for each class using a probabilistic approach.

단계(250)에서, 정보 획득 기법에 기초하여 선택된 형상 및 조건부 확률에 기초하여 형상 가중치를 이용하여 앙상블 딥러닝 분류기를 통해 환자의 심장병을 예측하도록 훈련한다. In step 250, the ensemble deep learning classifier is trained to predict the patient's heart disease using the shape weights based on the shape selected based on the information acquisition technique and the conditional probability.

도 3은 본 발명의 일 실시예에 따른 SHMS의 작업 흐름을 나타내는 도면이다. 3 is a diagram illustrating a workflow of SHMS according to an embodiment of the present invention.

도 3과 같이 제안된 SHMS의 작업 흐름은 데이터 수집(Data Collection Layer), 데이터 융합 및 형상 추출(Data Fusion and Feature Extraction Layer), 데이터 사전 처리(Data Preprocessing Layer), 질병 예측 및 권고(Disease Predicrion and Recommendation Layer) 등 4개의 순차적 계층을 사용한다. The work flow of the proposed SHMS as shown in FIG. 3 is data collection layer, data fusion and feature extraction layer, data preprocessing layer, disease prediction and recommendation. Recommendation Layer) and 4 sequential layers are used.

SHMS에는 두 개의 주요 데이터가 있다. 첫 번째 데이터는 무선 바디 센서 네트워크(WBSN)로부터 수집된 센서 데이터(310)이다. 다른 데이터는 EMR로부터 수집된 EMR 데이터(320)이다. 이 시스템은 의료 센서를 기반으로 한 WBSN을 사용하여 심전도(ECG), 전극(EEG), 심박수(EMG), 혈압(BP), 위치, 활동, 호흡률, 혈당, 산소포화도, 일일 건강 모니터링에 대한 환자의 콜레스테롤 수치 등 내외 생리학적 데이터를 수집한다. EMR은 환자 관찰 보고서, 의료 기록, 흡연 기록, 당뇨병 기록 및 상세한 임상 검사를 제공한다. 심장 질환자로부터 데이터를 감지한 후, 제안된 시스템은 관련 게이트웨이 장치로 데이터를 전송한다. 다양한 장치를 사용하여 감지된 데이터를 수집하고 전달하여 추가 처리를 할 수 있다. 이 시스템에서 생리학적 데이터는 블루투스 및 WiFi 장치를 통해 전송된다.There are two main pieces of data in SHMS. The first data is sensor data 310 collected from a wireless body sensor network (WBSN). Another data is EMR data 320 collected from EMR. The system uses WBSN based medical sensors to monitor electrocardiogram (ECG), electrodes (EEG), heart rate (EMG), blood pressure (BP), location, activity, respiratory rate, blood sugar, oxygen saturation, and daily health monitoring of the patient. Collect internal and external physiological data such as cholesterol levels. EMR provides patient observation reports, medical records, smoking records, diabetes records and detailed clinical examinations. After detecting the data from the heart disease patient, the proposed system transmits the data to the relevant gateway device. A variety of devices can be used to collect and forward the sensed data for further processing. In this system, physiological data is transmitted via Bluetooth and WiFi devices.

제안된 SHMS는 두 가지 유형의 심장병 예측 데이터를 고려한다. 즉, 도 3과 같이 환자 생리학적 데이터와 EMR이다. 환자 생리학적 데이터는 웨어러블 센서의 도움으로 수집된다. 생리학적 데이터를 수집하기 위해 의료 센서와 활동 센서의 두 가지 유형의 센서를 사용한다. 의료 센서에는 호흡수 센서, 산소 포화 센서, 혈압 센서, 콜레스테롤 수준 센서, 포도당 수준 센서, 온도 센서, EEG 센서 및 심전도 센서가 포함된다. 이것들은 환자의 신체에 연결되어 중단 없이 생리학적 데이터를 수집한다. 또 웨어러블 시계는 신체 활동과 심박수를 기록하는 데 활용된다. 그 환자의 신체 활동은 심장병 예측에 귀중한 정보를 제공한다. 이 모델은 심장 상태를 분석하기 위해 네트워크 장치를 사용하여 생리학적 데이터를 개인 데이터베이스로 전송한다. 데이터 수집 계층에서 ID는 도 3과 같이 모든 센서 데이터(예를 들어, S1은 EEG 센서 데이터)에 할당된다. 시스템은 이러한 신분을 처리 중 특징으로 간주한다. 또한 센서 데이터는 추가 처리를 위해 ID 및 숫자 값과 함께 열에 표시된다. The proposed SHMS considers two types of heart disease prediction data. That is, patient physiological data and EMR as shown in FIG. 3 . Patient physiological data is collected with the help of wearable sensors. Two types of sensors are used to collect physiological data: medical sensors and activity sensors. Medical sensors include respiratory rate sensors, oxygen saturation sensors, blood pressure sensors, cholesterol level sensors, glucose level sensors, temperature sensors, EEG sensors and electrocardiogram sensors. They connect to the patient's body and collect physiological data without interruption. Wearable watches are also used to record physical activity and heart rate. The patient's physical activity provides valuable information in predicting heart disease. The model uses a networked device to transmit physiological data to a personal database to analyze heart conditions. In the data collection layer, ID is assigned to all sensor data (eg, S1 is EEG sensor data) as shown in FIG. 3 . The system treats these identities as in-process features. Sensor data is also displayed in columns with ID and numeric values for further processing.

더욱이 환자의 구조화되지 않은 EMR은 심장병 예측에 대한 위험 요인을 식별하기 위해 수집된다. EMR은 실험실 보고서, 의료 기록, 질문과 관찰, 알레르기 및 약물, 개인 통계를 포함한다. EMR을 분석하여 질병 예측에 유용한 정보를 제공할 수 있는 FRF를 추출할 수 있다. FRF는 연령, 성별, 당뇨병의 존재, 흡연력, 콜레스테롤 수치, BP, 심박수, 배출량(EF), 체질량지수(BMI), 비만도(있는 경우), 식이요법 등을 포함한다. 단, EMR 데이터의 볼륨은 대개 크며, 각 레코드는 차원성이 높은 분산 변수를 가진 데이터로 구성된다. 따라서, 구조화되지 않은 텍스트 데이터에서 FRF를 효과적으로 추출하기 위해 텍스트 마이닝 방식을 사용한다.Moreover, unstructured EMR of patients is collected to identify risk factors for predicting heart disease. EMR includes laboratory reports, medical records, questions and observations, allergies and medications, and personal statistics. By analyzing EMR, it is possible to extract FRF, which can provide useful information for disease prediction. FRF includes age, sex, presence of diabetes, smoking history, cholesterol level, BP, heart rate, excretion (EF), body mass index (BMI), obesity (if any), and diet. However, the volume of EMR data is usually large, and each record consists of data with highly dimensional variance variables. Therefore, text mining method is used to effectively extract FRF from unstructured text data.

도 4는 본 발명의 일 실시예에 따른 FRF 추출 모듈을 설명하기 위한 도면이다. 4 is a view for explaining an FRF extraction module according to an embodiment of the present invention.

도 4와 같이 FRF 추출 모듈이라는 별도의 모듈을 통해 구조화되지 않은 EMR에서 정보를 추출한다. 구조화되지 않은 EMR은 심장병과 관련된 위험 요인을 식별하고 추출하기 위해 FRF 추출 모듈에 할당된다. 이 모듈은 구조화된 FRF 추출과 범주형 FRF 추출의 두 하위 모듈로 나뉜다. 구조화된 FRF 추출에서 시스템은 구조화된 형태로 값이 이미 사용 가능한 요인을 추출한다. 예를 들어, 시스템은 구조화된 분야의 값과 함께 나이, 키, 심박수, EF, 성별, BP 및 BMI를 추출한다. 범주형 FRF 추출에서 시스템은 값이 다른 클래스(예를 들어, 당뇨병 이력, 흡연 이력, 관상동맥 질환의 가족력 [CAD])에 나타나는 위험 요인을 추출한다. 이러한 요인은 텍스트 마이닝 기법과 규칙 기반 엔진을 사용하여 추출한다. 텍스트 마이닝 기법은 크게 세 단계로 구성된다. 첫 번째 단계에서, 형태학 및 표준형 결정(lemmatization) 알고리즘을 모든 비구조화 데이터에 적용하여 각 단어의 표준형(lemma)을 식별한다. 다음 단계는 복잡한 비구조화 텍스트를 작은 덩어리로 구분하는 토큰화이다. 세 번째 단계에서는 N-그램 접근방식을 사용해 위험요인을 추출한다. 일반적으로 위험요인의 값은 인접한 두 개 또는 세 개 단어로 나타난다. 따라서 bigram과 trigram은 각각 인접한 요인 단어 2개와 3개를 추출하는 데 사용된다. 성별, 연령, 콜레스테롤 수치 및 배출량을 나타내는 약어와 특정 특징을 식별하기 위한 규칙이 생성된다. 예를 들어, EMR은 48y의 구문을 포함할 수 있다. 값 48은 일련의 약어를 기반으로 환자의 연령으로 추출된다. 범주형 FRF 추출은 일련의 규칙에 근거하여 환자의 당뇨병 기록, 가족 CAD 기록 및 흡연 기록에 대한 정보를 얻는다. 위에 언급한 히스토리를 파악하기 위해 당뇨병, CAD, 흡연 용어를 기반으로 별도의 단어 가방을 제작하였다. 이 모듈은 심장병, 당뇨병 및/또는 흡연에 대한 정보가 포함되지 않은 기록을 제거하기 위해 규칙 기반 엔진의 도움을 받아 기록을 필터링한다. 또한 수치적 가치 없이 위험요인을 제거한다. 이 모듈의 결과는 구조화된 형식의 값 또는 범주를 가진 FRF이다.As shown in FIG. 4, information is extracted from the unstructured EMR through a separate module called the FRF extraction module. The unstructured EMR is assigned to the FRF extraction module to identify and extract risk factors associated with heart disease. This module is divided into two sub-modules: structured FRF extraction and categorical FRF extraction. In structured FRF extraction, the system extracts factors whose values are already available in a structured form. For example, the system extracts age, height, heart rate, EF, gender, BP and BMI along with the values of the structured fields. In categorical FRF extraction, the system extracts risk factors whose values appear in different classes (eg, history of diabetes, history of smoking, family history of coronary artery disease [CAD]). These factors are extracted using text mining techniques and a rule-based engine. The text mining technique consists of three main steps. In the first step, morphological and lemmatization algorithms are applied to all unstructured data to identify the lemma of each word. The next step is tokenization, which breaks down complex unstructured text into smaller chunks. In the third step, risk factors are extracted using an N-gram approach. In general, the value of a risk factor is represented by two or three adjacent words. Therefore, bigram and trigram are used to extract 2 and 3 adjacent factor words, respectively. Abbreviations indicating gender, age, cholesterol levels and emissions, and rules are created to identify specific characteristics. For example, the EMR may include the phrase 48y. A value of 48 is extracted as the patient's age based on a series of abbreviations. Categorical FRF extraction obtains information about a patient's diabetes history, family CAD history, and smoking history based on a set of rules. In order to understand the history mentioned above, separate word bags were created based on the terms diabetes, CAD, and smoking. This module filters records with the help of a rules-based engine to remove records that do not contain information about heart disease, diabetes and/or smoking. It also eliminates risk factors without numerical value. The result of this module is an FRF with values or categories in a structured format.

도 5는 본 발명의 일 실시예에 따른 텍스트 데이터에서 추출한 FRF와 센서 데이터의 융합을 설명하기 위한 도면이다. 5 is a diagram for explaining the fusion of FRF extracted from text data and sensor data according to an embodiment of the present invention.

도 5와 같이 텍스트 데이터에서 추출한 FRF와 센서 데이터의 융합을 논의한다. 융합은 다른 데이터 소스를 병합하여 분류에 더 가치 있고 관련 있는 데이터를 생성하는 절차다. 융합에는 데이터 수준, 형상 수준 및 결정 수준의 세 가지 레벨이 있다. 데이터 수준 융합은 서로 일치하는 이기종 소스의 서로 다른 데이터를 결합한다. 데이터 융합은 형상 수준과 의사결정 수준으로 분류할 수 있다. 형상 수준에서 형상을 서로 다른 데이터 세트에서 개별적으로 검색한 후 병합하여 예측에 가장 적합한 형상의 집합을 만든다. 결정 단계에서는 시스템의 정확도를 높이기 위해 다양한 프로세스의 결정을 고려한다. 통상 데이터 수준의 융합에는 중복 데이터가 많이 포함되어 있어 바람직하지 않다. 이와는 대조적으로 형상 수준에는 질병 위험을 식별하기에 충분한 정보가 포함되어 있다. 제안된 시스템에서는 데이터 수준과 형상 수준 융합을 모두 수행한다. 데이터 융합을 위한 작업흐름은 도 5와 같다. 첫째, 환자의 생리학적 데이터는 앞서 설명된 바와 같이 센서를 사용하여 수집하고, FRF는 EMR에서 추출한다. 그런 다음 센서 데이터는 나이, 성별, 당뇨병 기록, 흡연 기록 등 추출된 다른 FRF와 융합된다. 마지막으로 센서 데이터와 추출된 형상을 파싱(parsing)이 용이하도록 쉼표로 구분된 값(CSV) 파일로 변환한다. 이와 같이 시스템은 심장병과 관련된 형상들의 가장 좋은 조합을 찾아낸다. 제안된 시스템의 주요 목표는 심장병 예측을 위해 EMR에서 추출한 적절하고 저차원 특징을 가진 센서 데이터를 사용하는 것이다. 그러나 센서 데이터는 결측값을 가질 수 있으며 추출된 형상은 관련 정보를 구성하여 예측 정확도를 떨어뜨리고 형상 차원을 증가시킨다. 또한, 이것들은 분류를 위한 메모리의 필요성과 복잡성을 증가시킨다. 따라서 실제 처리 전에 데이터 사전처리가 적용되어 데이터의 품질을 향상시키고 시간과 메모리를 절약한다. As shown in FIG. 5 , the fusion of FRF extracted from text data and sensor data will be discussed. Fusion is the process of merging different data sources to create more valuable and relevant data for classification. There are three levels of fusion: data level, shape level, and decision level. Data-level fusion combines disparate data from disparate sources that match each other. Data fusion can be classified into a configuration level and a decision-making level. At the feature level, features are individually retrieved from different data sets and then merged to create a set of features best suited for prediction. In the decision stage, the decisions of various processes are considered in order to increase the accuracy of the system. In general, data-level fusion is undesirable because it contains a lot of redundant data. In contrast, the morphology level contains enough information to identify disease risk. The proposed system performs both data level and shape level fusion. The workflow for data fusion is shown in FIG. 5 . First, the patient's physiological data are collected using a sensor as previously described, and the FRF is extracted from the EMR. The sensor data is then fused with other extracted FRFs, such as age, gender, diabetes history, and smoking history. Finally, the sensor data and the extracted shape are converted into a comma-separated value (CSV) file for easy parsing. In this way, the system finds the best combination of features related to heart disease. The main goal of the proposed system is to use sensor data with appropriate and low-dimensional features extracted from EMR for heart disease prediction. However, the sensor data may have missing values and the extracted shape constitutes relevant information, which reduces the prediction accuracy and increases the shape dimension. In addition, they increase the complexity and the need for memory for classification. Therefore, data pre-processing is applied before actual processing, which improves the quality of the data and saves time and memory.

데이터 사전 처리는 ML 알고리즘을 적용하기 전에 가장 필수적인 단계이다. 노이즈 있고 불완전하며 일관성이 없는 경향이 있기 때문에 예측 과제에 실제 데이터를 직접 활용하는 것은 불가능하다. 따라서 심장병 예측을 위해 데이터를 효과적으로 표현하기 위해 사전처리 단계가 적용된다. 데이터 사전처리는 누락된 데이터 필터링, 정규화, 형상 선택 및 형상 가중치를 포함한다.Data preprocessing is the most essential step before applying ML algorithms. Direct application of real-world data for prediction tasks is impossible because it tends to be noisy, incomplete, and inconsistent. Therefore, a pre-processing step is applied to effectively represent the data for heart disease prediction. Data preprocessing includes missing data filtering, normalization, shape selection and shape weighting.

웨어러블 센서를 사용하여 수집한 데이터와 EMR에서 추출한 데이터는 쓸모없고 잘못된 정보를 포함하고 있다. 심장병 예측을 위한 웨어러블 센서 데이터는 누락된 값과 노이즈와 같은 신호 아티팩트에 의해 손상되어 예측 정확도가 저하되거나 부정확한 결과를 생성한다. 또한 EMR에서 추출한 데이터는 최소 하나의 값을 포함하지 않을 때 누락된 것으로 추정된다. 추출한 데이터에서 FRF 값을 인식하는 텍스트 마이닝 기법의 실패로 인해 정보가 누락되거나 FRF 값이 기록되지 않았을 수 있다. 본 발명에서는 Kalman 필터링이라고 불리는 잘 알려진 필터링 접근법을 사용하여 데이터를 필터링한다. 이 필터는 노이즈, 중복 기록 및 불일치를 제거하여 데이터를 정리한다. 또한 데이터 필터링 단계에서 다음과 같은 두 가지 비감독 필터를 활용한다: RemoveUseless 및 ReplaceMissing 값. 첫 번째 필터는 최대 90%의 분산으로 불필요한 속성을 제거한다. 두 번째 필터는 구조화된 데이터 세트의 모든 결측값을 다음 방정식을 사용하여 기존 데이터의 평균 및 중간값으로 대체한다.Data collected using wearable sensors and data extracted from EMR are useless and contain misleading information. Wearable sensor data for predicting heart disease can be corrupted by signal artifacts such as missing values and noise, resulting in poor prediction accuracy or inaccurate results. In addition, data extracted from EMR are assumed to be missing when they do not contain at least one value. Information may be missing or the FRF value may not be recorded due to the failure of the text mining technique to recognize the FRF value in the extracted data. The present invention filters data using a well-known filtering approach called Kalman filtering. This filter cleans the data by removing noise, overwrites, and inconsistencies. It also utilizes two unsupervised filters in the data filtering phase: RemoveUseless and ReplaceMissing values. The first filter removes unnecessary attributes with a variance of up to 90%. The second filter replaces all missing values in the structured data set with the mean and median values of the existing data using the following equation:

(1)

(One)

여기서

,

및

는 각각 X= {"나이", "콜레스테롤", "성별", "심박수", "CAD history", "흡연 이력"}과 같은 형상,

={0, 2}의 특징의 범주 수준, 패턴 넘버, 범주

내에서 형상

의

번째 패턴, 범주

내에서 형상

의 평균을 각각 나타낸다. 이 작업에서 는 범주

내에서 형상

의 결측값을 대체한다. 또한, 웨어러블 센서 기반 생성 데이터의 한계를 극복하기 위해 추출된 FRF 값을 활용한다. 예를 들어, 누락된 값을 데이터 세트의 현재 FRF 속성 값으로 대체한다.here

,

and

X = {"age", "cholesterol", "gender", "heart rate", "CAD history", "smoking history"}, respectively.

Category level, pattern number, category of the feature in ={0, 2}

shape within

of

second pattern, category

shape within

represents the average of each. In this task, the category

shape within

replace missing values of In addition, the extracted FRF value is utilized to overcome the limitations of wearable sensor-based generated data. For example, replace missing values with the values of the current FRF attribute in the data set.

심장병 데이터 세트

는 많은 형상을 포함하고 있으며, 모든 형상은 다른 수치값을 포함하고 있어 계산 과정 중 어려움이 증가한다. 따라서 정규화 기법은 0과 1 사이의 범위에서 데이터 집합

를 정규화하고 심장병예측의 연산 과정 중 수치적 복잡성을 줄이는 데 사용된다. 데이터 정규화에 다양한 방법을 활용할 수 있다. 제안된 시스템에서는 잘 알려진 최소-최대 정규화 방법을 사용한다. 이 방법은 다음과 같은 식을 사용하여 원래 데이터 세트

의 숫자 값 DV를 [0, 1] 간격 내에

으로 표시한다. heart disease data set

contains many shapes, and all shapes contain different numerical values, increasing the difficulty during the calculation process. Therefore, the regularization technique is a set of data in the range between 0 and 1.

It is used to normalize and reduce the numerical complexity during the computational process of heart disease prediction. Various methods are available for data normalization. The proposed system uses a well-known min-max normalization method. This method uses the expression

Numeric values of dv within the interval [0, 1]

indicated as

(2)

여기서

,

는 전체 데이터 집합에서 각각 정규화된 데이터 값, 원본 데이터 값, 최소 데이터 값 및 최대 데이터 값이며 new_max와 new_min은 변환된 데이터 집합의 범위를 나타낸다. new_max=1과 new_min=0을 사용한다. 이 방법을 사용하여 모든 형상의 값은 간격 [0, 1] 내에 있다.here

,

is the normalized data value, the original data value, the minimum data value, and the maximum data value in the entire data set, respectively, and new_max and new_min represent the range of the transformed data set. Use new_max=1 and new_min=0. Using this method, the values of all shapes are within the interval [0, 1].

환자 기록은 일반적으로 예측의 정확도를 떨어뜨리는 많은 관련 없는 특징으로 구성된다. 그러나 의료 기록에서 의미 있는 정보를 추출하는 것, 무관한 특징을 제거하여 노이즈를 줄이는 것, 제한된 수의 특징으로 심장병을 정확하게 예측하는 것 등은 모두 어려운 과제이다. 예측 모델을 적용하기 전에 노이즈가 많은 데이터를 제거하고 정확한 결과를 얻을 수 있는 유용한 기능을 선택하고 데이터 세트의 복잡성과 차원성을 줄이는 것이 필수적이다. 따라서 형상 선택은 데이터의 명확성을 개선하고 앙상블 딥러닝 모델의 훈련 시간을 단축하는 중요한 단계이다. 의료 데이터 세트에는 순차적 미래 선택, 가중 최소 제곱, 대략적인 집합 및 단변량 형상 선택과 같은 다양한 형상 선택 방법이 사용된다. 노이즈 형상을 제거하여 예측 결과에 영향을 미치는 정보 획득 방법을 활용한다. 심장병 예측을 위한 구조화된 데이터 집합

에는 27개의 속성이 있다. 그들 중 몇몇만이 질병을 주어진 범주 중 하나로 분류하는데 유용하다. 시스템은 데이터 세트의 기능의 중요성에 기초하여 특정 문제에 대해 배울 수 있다. 제안된 시스템에서는 정보 이득(Information Gane; IG)을 이용하여 분류 과제에 따라 중요성을 측정하는 형상을 선택한다. 제안된 시스템은 시스템 불확실성을 측정하기 위해 엔트로피를 사용한다. 그것은 다음 식에서와 같이 주어진 두 개의 개별 변수인 A와 B의 선행 엔트로피와 후행 엔트로피 사이의 차이를 발견한다.Patient records usually consist of many extraneous features that reduce the accuracy of predictions. However, extracting meaningful information from medical records, reducing noise by removing extraneous features, and accurately predicting heart disease with a limited number of features are all challenging tasks. Before applying the predictive model, it is essential to remove the noisy data, select useful features that will give accurate results, and reduce the complexity and dimensionality of the data set. Therefore, shape selection is an important step in improving the clarity of the data and reducing the training time of ensemble deep learning models. Various shape selection methods are used for medical data sets, such as sequential future selection, weighted least squares, coarse aggregation, and univariate shape selection. The information acquisition method that affects the prediction result by removing the noise shape is utilized. Structured data sets for predicting heart disease

has 27 attributes. Only a few of them are useful for classifying a disease into one of the given categories. A system can learn about a particular problem based on the importance of the function of the data set. The proposed system selects a shape that measures importance according to the classification task using the information gain (IG). The proposed system uses entropy to measure system uncertainty. It finds the difference between the leading and trailing entropy of two separate variables, A and B, given as in the following equation.

(3)

여기서 A와 B는 별개의 랜덤 변수로서, 형상 A의 선행 엔트로피는 식 4를 사용하여 계산할 수 있다. Here, A and B are separate random variables, and the preceding entropy of shape A can be calculated using Equation 4.

(4)

여기서

는

의 선행 확률을 나타낸다. 후행 엔트로피 B를 받은 후 A의 조건부 엔트로피는 식 5를 사용하여 계산할 수 있다.here

Is

represents the leading probability of . After receiving the trailing entropy B, the conditional entropy of A can be calculated using Equation 5.

(5)

IG는 식 4와 5를 식 3에 넣어 측정할 수 있다: IG can be measured by putting Equations 4 and 5 into Equation 3:

(6)

제안된 시스템은 식 6을 이용한 심장병 예측 과제에 대하여 각 형상의 중요성을 추정한다. 이 시스템은 각 형상에 대해 IG를 측정한 후 가장 중요하지 않은 형상을 삭제한다. 성능 저하가 끝날 때까지 한 번에 하나의 형상을 제거하여 형상을 줄인다. The proposed system estimates the importance of each shape for the heart disease prediction task using Equation 6. The system measures the IG for each feature and then discards the least significant features. Reduce the features by removing one feature at a time until the degradation is over.

형상 가중치는 그 중요성에 기초하여 각 형상에 가중치를 할당하는 방법이다. 훈련 데이터 세트에서 중복되고 무관한 기능을 완전히 제거하는 기능 선택 방식과는 다르다. 분석 계층화 프로세스 및 규칙 기반 가중치 부여와 같이 형상 가중치 부여에 다양한 방법이 활용되었다. 이러한 방법은 모든 클래스에 대해 각 형상에 동일한 가중치를 할당하는데, 형상의 일반 가중치라고 한다. 일반적인 형상 가중치 부여에서 형상 중요성은 먼저 각 클래스에 대해 결정된다. 그런 다음 모든 클래스에 대한 전체 형상 유의성을 식별하기 위해 최대화 및 합계 기능을 사용한다. 그러나 기존 접근법에는 몇 가지 한계가 있다. 주요 이슈 중 하나는 그들이 불확실한 조합 운용을 기반으로 하고 있다는 것인데, 이것은 구별용 형상의 중요성을 설정할 수 있다. 또한 이러한 기존 접근법은 이론적 뒷받침이 부족하여 MSE를 증가시키고 예측 모델의 정확도를 감소시킨다. 본 발명에서는 확률론적 접근법을 사용하여 각 클래스에 대한 특정 형상 가중치를 달성한다. 일반적인 가중치 부여보다 더 세련된 접근법이다. 예측 과제의 경우 형상 가중치가 클래스에 특정되어야 한다. 이는 각 클래스마다 형상의 서로 다른 의의를 학습함으로써 예측 모델의 성능을 향상시킨다. 일반 형상과 특정 형상의 가중치 행렬은 표 1과 표 2에 각각 나와 있다. Shape weighting is a method of assigning a weight to each shape based on its importance. This is different from feature selection, which completely removes redundant and irrelevant features from the training data set. Various methods have been used for shape weighting, such as analysis stratification process and rule-based weighting. This method assigns equal weights to each shape for all classes, called the normal weight of the shape. In general shape weighting, shape importance is first determined for each class. We then use the maximize and sum functions to identify overall shape significance for all classes. However, existing approaches have some limitations. One of the main issues is that they are based on uncertain combinatorial operation, which can establish the importance of distinguishing features. In addition, these existing approaches lack theoretical support, increasing MSE and decreasing the accuracy of predictive models. The present invention uses a probabilistic approach to achieve specific shape weights for each class. It's a more sophisticated approach than normal weighting. For predictive tasks, shape weights should be class-specific. This improves the performance of the predictive model by learning the different meanings of shapes for each class. The weight matrices of the general shape and the specific shape are shown in Table 1 and Table 2, respectively.

<표 1><Table 1>

<표 2><Table 2>

표 1에서는 각 형상이 모든 클래스에 대해 동일한 가중치를 공유하는 반면, 표 2에서는 각 형상이 모든 클래스에 대해 다른 가중치를 공유한다.

,

, ....,

을 n개의 형상 변수라 하면,

는 형상 벡터 〈

,

, ....,

〉에 의해 나타나는 인스턴스로서, 여기서

는

의 값을 나타낸다. 예를 들어

의 특정 형상 가중치는 다음 식을 사용하여 계산할 수 있다.In Table 1, each shape shares the same weight for all classes, whereas in Table 2 each shape shares a different weight for all classes.

,

, ....,

Let be n shape variables,

is the shape vector q

,

, ....,

An instance represented by >, where

Is

represents the value of for example

The specific shape weight of can be calculated using the following equation.

(7)

여기서

와

는 각각 클래스 변수 값과 클래스 c에 대한 형상값

의 형상 가중치를 나타낸다.

의 값은 형상값

과 관련이 있으며, 여기서 각 형상값에 서로 다른 가중치가 할당된다.

의 범위는 0부터 1까지이며, 이는 심장병 예측 과제에 대한 형상값

의 중요성을 나타낸다. 이 접근법은 심장병 형상의 값을 보다 관리하기 쉬운 형태로 변환하는데 유용하다. 형상값에 대한 이러한 가중치를 획득하는 주된 목적은 깊이 있는 학습 모델을 가진 초기 가중치로 사용하여 더 나은 예측 결과를 얻기 위한 것이다. here

Wow

is the class variable value and the shape value for class c, respectively.

represents the shape weight of .

The value of is the shape value

, where different weights are assigned to each shape value.

The range of is from 0 to 1, which is the shape value for the heart disease prediction task.

indicates the importance of This approach is useful for transforming the values of heart disease shape into a more manageable form. The main purpose of obtaining these weights for shape values is to use them as initial weights with a deep learning model to obtain better prediction results.

도 6은 본 발명의 일 실시예에 따른 개념화된 앙상블 심층 학습 모델을 설명하기 위한 도면이다. 6 is a diagram for explaining a conceptualized ensemble deep learning model according to an embodiment of the present invention.

제안된 SHMS의 네 번째 계층은 프레임워크가 도 6에 제시된 것과 같이 개념화된 앙상블 심층 학습 모델을 포함한다. 이 모델은 심장병의 2진 분류에 오차 역 전파 기법(back propagation)과 경사(gradient) 알고리즘을 활용한 피드 포워드 네트워크이다. 우선 모델을 훈련시키기 위해 클리블랜드(Cleveland) 데이터 세트로부터 다른 수의 형상을 사용하며, 그 후 이를 활용하여 도 6에 제시된 실시간 입력 데이터의 결과를 예측했다. LogitBoost라는 부스팅 알고리즘을 메타 학습 분류기로 사용했는데, 이 알고리즘은 딥러닝 모델을 부스트하여 높은 정확도를 달성한다. 본 발명의 실시예와 문헌 검토에 기초하여, 부스팅 알고리즘은 AdaBoost 알고리즘에 비해 노이즈가 많은 데이터를 처리하는 데 더 적합하다. 그것은 분류 성능을 향상시키기 위해 편향과 분산을 줄이기 위해 고안되었다. The fourth layer of the proposed SHMS includes the ensemble deep learning model conceptualized as the framework shown in FIG. This model is a feed-forward network using an error back propagation and gradient algorithm for binary classification of heart disease. First, we use a different number of shapes from the Cleveland data set to train the model, and then use them to predict the results of the real-time input data presented in Fig. 6 . We used a boosting algorithm called LogitBoost as a meta-learning classifier, which boosts the deep learning model to achieve high accuracy. Based on the embodiments of the present invention and literature review, the boosting algorithm is more suitable for processing noisy data than the AdaBoost algorithm. It is designed to reduce bias and variance to improve classification performance.

앙상블 딥러닝 모델은 입력 계층, 3개의 히든(hidden) 계층, 출력 계층 등 5개의 계층으로 구성되어 있다. 입력 계층은 데이터 세트의 형상 수에 따라 16개의 노드로 구성된다. 서로 다른 수의 노드를 사용하여 신경 모델의 히든 계층을 평가하는 것이 필수적이다. 제안된 작업에서는, 20개의 노드로 완전히 연결된 히든 계층을 고려하는데, 이것은 일관된 성능을 만들어낸다. 속성은 입력 계층을 사용하여 심층 신경 네트워크 모델에 할당된다. 그런 다음 이러한 속성은 해당 가중치에 값을 곱하여 세 개의 히든 계층으로 전달된다. 가중합계를 계산하고, 식 8과 같이 히든 계층 노드에서 입력 데이터를 처리하기 위해 바이어스를 추가한다.The ensemble deep learning model consists of five layers: an input layer, three hidden layers, and an output layer. The input layer consists of 16 nodes according to the number of shapes in the data set. It is essential to evaluate the hidden layers of the neural model using different numbers of nodes. In the proposed work, we consider a fully connected hidden layer of 20 nodes, which produces consistent performance. Attributes are assigned to the deep neural network model using the input layer. These attributes are then propagated to the three hidden layers by multiplying their weights by values. Calculate the weighted sum and add a bias to process the input data at the hidden layer node as shown in Equation 8.

(8)

여기서

,

및

는 각각 입력 데이터, 노드 간 가중치 및 바이어스를 나타낸다. 이후

는 정류된 선형유닛(ReLU) 활성화 함수를 사용하여 변환되고 변환된 데이터를 출력 노드로 가져와 심장병을 예측한다. 딥러닝 모델의 다른 파라미터는 학습율(0.03)을 포함하며, 아담 최적기를 사용한다. 출력 계층의 크기는 2진법 분류(심장병의 존재 또는 심장병의 부재)의 결과를 나타내는 2개의 노드이다. here

,

and

denotes input data, inter-node weights and biases, respectively. after

is transformed using a rectified linear unit (ReLU) activation function and brings the transformed data to the output node to predict heart disease. Another parameter of the deep learning model includes the learning rate (0.03) and uses the Adam optimizer. The size of the output layer is two nodes representing the result of binary classification (presence or absence of heart disease).

도 7은 본 발명의 일 실시예에 따른 심장 질환자를 위한 권고안을 만들기 위하여 사용되는 온톨로지를 나타내는 도면이다.7 is a diagram illustrating an ontology used to make a recommendation for a person with a heart disease according to an embodiment of the present invention.

온톨로지는 클래스(형상)와 그것들 사이의 관계를 모은 것이다. 온톨로지는 다른 모듈들 사이에서 특정 도메인에서 온 지식을 공유하기 위해 어휘를 사용한다. 온톨로지의 정의와 구성 절차는 종래기술에서 광범위하게 논의된다. 제안된 온톨로지는

온톨로지 언어(OWL)의 여러 단계를 사용하여 개발되었다. 첫째, 클래스가 구성된다. 둘째, 데이터 속성과 오브젝트 속성이 구현되어 클래스 간의 관계를 결정하는 데 활용된다. 마지막으로, 추론 규칙은 Semantic Web Rule Language 규칙을 사용하여 권고를 위해 작성된다. 제안된 온톨로지를 세 가지 주요 이유로 사용한다. 첫째로, 대부분의 심장병 예측 시스템은 의사들로 하여금 환자를 위한 어떤 종류의 조언을 위해 의료 리포트를 재확인하게 한다. 따라서 병원이나 원격지에서 자동적으로 환자에게 식이요법 계획이나 활동을 제안할 수 있는 새로운 권고 시스템이 필요하다. 둘째, 권고 모듈은 사전 진단 및 진단 심장 질환자에게 조언하기 위한 전문적인 지식으로 사용될 수 있는 도메인 지식을 요구한다. 셋째, 심장병의 증상은 다른 질병과 비슷하여 의사를 혼란스럽게 할 수 있으므로 권고 중에 혼란을 없애기 위한 규칙 기반 시스템이 필요하다. 도 7은 심장 질환자를 위한 권고안을 만들기 위하여 사용되는 제안된 온톨로지를 제시한다. An ontology is a collection of classes (shapes) and relationships between them. Ontology uses vocabulary to share knowledge from a specific domain among other modules. The definition and construction procedure of an ontology are extensively discussed in the prior art. The proposed ontology is

It was developed using several levels of the ontology language (OWL). First, the class is constructed. Second, data properties and object properties are implemented and utilized to determine the relationship between classes. Finally, inference rules are written for recommendations using Semantic Web Rule Language rules. We use the proposed ontology for three main reasons. First, most heart disease prediction systems allow doctors to double-check medical reports for some kind of advice for their patients. Therefore, there is a need for a new recommendation system that can automatically suggest a diet plan or activity to a patient in a hospital or remote location. Second, the recommendation module requires domain knowledge that can be used as professional knowledge to advise patients with pre-diagnosis and diagnosis heart disease. Third, the symptoms of heart disease can be similar to other diseases, which can confuse doctors, so a rule-based system is needed to eliminate confusion during recommendations. 7 presents the proposed ontology used to make recommendations for heart disease patients.

온톨로지 기반 권고 모델은 제안된 SHMS에서 가장 필수적인 부분이다. 개발된 온톨로지는 사실과 규칙으로 구성된다. 이 사실에는 도 7과 같이 센서 데이터와 EMR, 치료, 증상 및 실험실 시험 결과에서 추출된 형상이 포함된다. 규칙은 예측된 건강 상태 결과에 따라 환자에게 권고를 제공하는 전문 지식인 SWRL 규칙을 포함한다. 이러한 규칙은 모든 유형의 심장병에 대한 온라인 논문과 심장병 전문의로부터 수집된다. 본 발명에서는 심장 질환자의 활동과 식이요법 계획을 정확하게 권고하는 56개의 규칙을 설계하였다. SWRL 규칙 중 일부는 이전에 발행된 본 논문[48]에서 철저히 논의된다. 그러나 모든 규칙의 사용을 이해하기 위해 다음과 같은 규칙을 설명한다. The ontology-based recommendation model is the most essential part of the proposed SHMS. The developed ontology consists of facts and rules. This fact includes the shape extracted from sensor data, EMR, treatment, symptoms, and laboratory test results, as shown in FIG. 7 . Rules include SWRL rules, which are expert knowledge that provide recommendations to patients based on predicted health status outcomes. These rules are gathered from online papers and cardiologists on all types of heart disease. In the present invention, 56 rules were designed to accurately recommend activities and diet plans for patients with heart disease. Some of the SWRL rules are thoroughly discussed in this previously published paper [48]. However, in order to understand the use of all the rules, the following rules are explained.

규칙: rule:

설명: X가 심장병 환자이고 각각 BMI, HR, 운동 결과가 high, normal, no인 경우 권고사항은 신체적 활동이다. 이 규칙은 소비되는 칼로리가 연소되도록 환자가 신체 활동을 증가시키도록 권고하는 것이다. Description: If X is a patient with heart disease and BMI, HR, and exercise results are high, normal, and no, respectively, the recommendation is physical activity. This rule recommends that the patient increase physical activity so that the calories burned are burned.

딥러닝 모델은 먼저 환자 데이터를 사용하여 심장병을 예측한다. 예측 후 시스템은 환자의 성별을 식별한다. 남성 심장 질환자에 대한 권고사항이 여성에 대한 권고사항과 다르다는 것이 그 배경이다. 또한, 시스템은 환자의 나이를 찾은 다음 그 나이가 속하는 그룹(young, adult, old)을 식별한다. 그런 다음 모델은 환자의 성별, 나이 및 예측 결과에 기초한 식이요법 계획 또는 활동(예를 들어, 금연, 신체 활동 증가, 체중 조절, 육류 소비 감소)을 권고한다. 또한 이 모듈은 추출된 형상의 값이 지나치게 높고 예측된 결과가 음수일 경우 구조대 및 비상 서비스를 호출한다. A deep learning model first uses patient data to predict heart disease. After prediction, the system identifies the patient's gender. The background is that the recommendations for men with heart disease are different from those for women. In addition, the system finds the age of the patient and then identifies the group (young, adult, old) to which that age belongs. The model then recommends a dietary plan or activity (eg, quit smoking, increased physical activity, weight control, reduced meat consumption) based on the patient's gender, age, and predicted outcomes. The module also calls rescue and emergency services if the value of the extracted shape is too high and the predicted result is negative.

본 발명의 실시예에 따르면, 심장병 예측을 위해 앙상블 딥러닝과 형상 융합 기법을 사용하여 센서 데이터와 의료 기록을 모두 처리하는 새로운 정보 프레임워크가 제안된다. According to an embodiment of the present invention, a new information framework for processing both sensor data and medical records using ensemble deep learning and shape fusion techniques for heart disease prediction is proposed.

구조화되지 않은 EMR에서 심장병의 저차원 위험 요인을 검출하고 추출하기 위해 FRF 추출 모듈이 개발되었다. 또한 심장병 예측을 위해 센서 데이터와 추출된 FRF를 모두 결합하기 위해 형상 레벨 융합을 실시한다. An FRF extraction module was developed to detect and extract low-dimensional risk factors for heart disease from unstructured EMR. In addition, shape-level fusion is performed to combine both the sensor data and the extracted FRF for heart disease prediction.

IG 접근방식은 무관한 특징을 제거하여 데이터 세트의 복잡성과 차원성을 감소시킴으로써 노이즈를 줄이는 형상 선택에 대해 제안된다. 또한 예측 정확도를 높이기 위해 각 클래스(등급)에 대한 특정 형상 가중치를 계산하는 조건부 확률 접근법을 활용한다. The IG approach is proposed for shape selection that reduces noise by reducing the complexity and dimensionality of the data set by removing extraneous features. It also utilizes a conditional probabilistic approach that computes specific shape weights for each class (class) to increase prediction accuracy.

LogitBoost라는 부스팅 알고리즘은 편중과 분산을 줄이고 딥러닝 모델을 활성화하여 높은 정확도를 달성하는 메타 학습 분류기로 활용된다. 게다가, Semantic Web Rule Language(SWRL) 규칙에 기초한 온톨로지가 개발되어 심장 질환자를 위한 식이요법 계획이나 활동을 자동으로 권장한다. A boosting algorithm called LogitBoost is used as a meta-learning classifier that reduces bias and variance and achieves high accuracy by activating deep learning models. In addition, an ontology based on Semantic Web Rule Language (SWRL) rules has been developed to automatically recommend diet plans or activities for people with heart disease.

제안된 시스템은 성능을 향상시키고 다른 최신 방법에 비해 심장병 예측에서 98.5%의 높은 정확도를 얻는다.The proposed system improves the performance and achieves a high accuracy of 98.5% in predicting heart disease compared to other state-of-the-art methods.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

collecting, by the gateway device, data on the heart disease patient through wearable sensor measurement and electronic medical testing;
extracting, by the health state prediction and disease diagnosis engine, Framingham Risk Functions (FRF) from Electronic Medical Record (EMR);
combining, by the health state prediction and disease diagnosis engine, data collected through FRF and wearable sensor measurements using a shape fusion approach, and generating medical data on heart disease;
selecting, by the health state prediction and disease diagnosis engine, a shape based on an information acquisition technique, and calculating shape weights based on conditional probability; and
Training the health state prediction and disease diagnosis engine to predict a patient's heart disease through an ensemble deep learning classifier using a shape weight based on a shape selected based on an information acquisition technique and a conditional probability;
A smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion, including

According to claim 1,
The step of the health state prediction and disease diagnosis engine extracting the FRF from the EMR comprises:
It uses text mining techniques and rule-based engines, and applies morphology and lemmatization algorithms to all unstructured data to identify the canonical form (lemma) of each word, and tokenization that separates unstructured text into small chunks. and extracting the FRF using the N-gram approach.
Smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion.

According to claim 1,
The health state prediction and disease diagnosis engine selects a shape based on an information acquisition technique, and calculates shape weights based on conditional probability,
Using the information acquisition technique, we select a shape to measure importance according to the classification task, and use entropy to measure system uncertainty to find the difference between the leading and trailing entropy of two given individual variables.
Smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion.

According to claim 1,
The health state prediction and disease diagnosis engine selects a shape based on an information acquisition technique, and calculates shape weights based on conditional probability,
Determine shape significance for each class, use maximize and sum functions to identify overall shape significance for all classes, and calculate specific shape weights for each class using a probabilistic approach.
Smart healthcare monitoring method for heart disease prediction based on ensemble deep learning and shape fusion.

a gateway device that collects data on a person with heart disease through wearable sensor measurement;
a database for storing data on patients with heart disease collected through the gateway device and Framingham Risk Functions (FRF) extracted from EMR (Electronic Medical Record);
Using a shape fusion approach, combining data collected through FRF and wearable sensor measurements and generating medical data on heart disease, selecting shapes based on information acquisition techniques, and calculating shape weights based on conditional probability, information A health state prediction and disease diagnosis engine that predicts a patient's heart disease through an ensemble deep learning classifier using shape weights based on shape selected based on acquisition technique and conditional probability
Smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and shape fusion.

6. The method of claim 5,
Health condition prediction and disease diagnosis engine,
a data generating unit that combines data collected through FRF and wearable sensor measurements using a shape fusion approach and generates medical data on heart disease;
a shape selection and weight calculation unit that selects a shape based on an information acquisition technique and calculates a shape weight based on a conditional probability; and
A learning unit that trains to predict a patient's heart disease through an ensemble deep learning classifier using a shape weight based on a shape selected based on an information acquisition technique and a conditional probability
Smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and shape fusion.

7. The method of claim 6,
The shape selection and weight calculation unit,
Using the information acquisition technique, we select a shape to measure importance according to the classification task, and use entropy to measure system uncertainty to find the difference between the leading and trailing entropy of two given individual variables.
Smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and shape fusion.

7. The method of claim 6,
The shape selection and weight calculation unit,
Determine shape significance for each class, use maximize and sum functions to identify overall shape significance for all classes, and calculate specific shape weights for each class using a probabilistic approach.
Smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and shape fusion.