KR102306115B1

KR102306115B1 - A Novel Healthcare Monitoring Method and Apparatus Using Wearable Sensors and Social Networking Data

Info

Publication number: KR102306115B1
Application number: KR1020190079878A
Authority: KR
Inventors: 곽경섭; 샤커; 리아즈; 알리펄만
Original assignee: 인하대학교 산학협력단
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2021-09-28
Also published as: KR20210004058A

Abstract

웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 방법 및 장치가 제시된다. 본 발명에서 제안하는 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 장치는 웨어러블 기기로부터 생리적 데이터, 스마트 단말로부터 약물 및 증상 데이터, 어플리케이션 프로그래밍 인터페이스(Application Programming Interfaces; API)를 통한 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 수집하는 데이터 수집부, 개인 클라우드 서버와 연결되어 수집된 생리적 데이터, 약물 및 증상 데이터 및 논의 데이터를 저장하는 데이터 저장부 및 저장된 데이터에 대한 특징 극성 식별, 워드 임베딩 및 온톨로지 기반 특징 추출, 웨어러블 센서 데이터에서 추출, 데이터 구성 요소 분석, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측을 포함하는 데이터의 계산 및 분류를 수행하는 빅데이터 분석 엔진을 포함한다.A healthcare monitoring method and device using wearable sensors and social network data are presented. The healthcare monitoring device using the wearable sensor and social network data proposed in the present invention provides physiological data from wearable devices, drug and symptom data from smart terminals, and application programming interfaces (APIs) for patients and doctors. A data collection unit that collects discussion data on social networks, a data storage unit that connects to a personal cloud server to store collected physiological data, drug and symptom data, and discussion data, and feature polarity identification, word embedding and ontology for the stored data It includes a big data analysis engine that performs calculation and classification of data, including basic feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and drug side effects prediction.

Description

A Novel Healthcare Monitoring Method and Apparatus Using Wearable Sensors and Social Networking Data

본 발명은 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 방법 및 장치에 관한 것이다.The present invention relates to a healthcare monitoring method and apparatus using a wearable sensor and social network data.

현재 의료 산업의 기술은 개별 환자에 대한 생리적 정보를 수집하는 효율적인 방법을 제공하는 데 핵심적인 역할을 한다. 또한 스마트폰과 웨어러블 센서를 활용하여 환자 데이터를 수집함으로써 효율적인 의료 모니터링을 할 수 있다. 사람들은 또한 특정 약물에 대한 감정과 의견을 공유하기 위해 소셜 네트워킹을 사용하는데, 이것은 정서적 장애를 관찰하고 약물 부작용을 예측하는 데 유용하다. 그러나 지속적인 환자 모니터링은 대량의 의료 데이터를 생성하며, 소셜 네트워킹 사이트의 사용자가 생성한 의료 데이터는 대량 제공되고 구조화되지 않으며, 예기치 않은 것일 수 있다. 따라서 현재의 추세는 스마트폰, 웨어러블 센서, 의료기록, 소셜 네트워크(다시 말해, 빅데이터) 등 다양한 출처에서 추출한 방대한 양의 의료 데이터를 처리할 수 있는 고급 접근법이 필요하다. 현재의 의료 모니터링 시스템은 기기 및 소셜 네트워킹 데이터에서 중요한 정보를 추출하는 데 효율적이지 않으며, 효과적으로 분석하는 데 어려움을 겪고 있다. 또한, 기존의 기계 학습 기법은 비정상적인 예측을 위해 추출된 의료 데이터를 다루고 처리할 수 없다. 따라서, 의료 데이터를 정밀하게 저장 및 분석하고 분류 정확도를 개선하기 위해 클라우드 환경 및 빅데이터 분석 엔진을 기반으로 하는 새로운 헬스케어 모니터링 아키텍처를 제안한다. Current technology in the medical industry plays a key role in providing efficient methods for collecting physiological information about individual patients. In addition, efficient medical monitoring is possible by collecting patient data using smartphones and wearable sensors. People also use social networking to share their feelings and opinions about certain drugs, which is useful for observing emotional disorders and predicting drug side effects. However, continuous patient monitoring generates large amounts of medical data, and medical data generated by users of social networking sites can be massive, unstructured, and unexpected. The current trend therefore calls for advanced approaches capable of processing vast amounts of medical data extracted from a variety of sources, including smartphones, wearable sensors, medical records, and social networks (i.e. big data). Current medical monitoring systems are not efficient at extracting sensitive information from device and social networking data, and they struggle to effectively analyze them. In addition, existing machine learning techniques cannot handle and process extracted medical data for abnormal prediction. Therefore, we propose a new healthcare monitoring architecture based on cloud environment and big data analysis engine to precisely store and analyze medical data and improve classification accuracy.

본 발명이 이루고자 하는 기술적 과제는 의료 데이터를 정밀하게 저장 및 분석하고 분류 정확도를 개선하기 위해 클라우드 환경 및 빅데이터 분석 엔진을 기반으로 하는 새로운 헬스케어 모니터링 방법 및 장치를 제공하는데 있다.An object of the present invention is to provide a new healthcare monitoring method and apparatus based on a cloud environment and a big data analysis engine to precisely store and analyze medical data and improve classification accuracy.

일 측면에 있어서, 본 발명에서 제안하는 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 장치는 웨어러블 기기로부터 생리적 데이터, 스마트 단말로부터 약물 및 증상 데이터, 어플리케이션 프로그래밍 인터페이스(Application Programming Interfaces; API)를 통한 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 수집하는 데이터 수집부, 개인 클라우드 서버와 연결되어 수집된 생리적 데이터, 약물 및 증상 데이터 및 논의 데이터를 저장하는 데이터 저장부, 및 저장된 데이터에 대한 특징 극성 식별, 워드 임베딩 및 온톨로지 기반 특징 추출, 웨어러블 센서 데이터에서 추출, 데이터 구성 요소 분석, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측을 포함하는 데이터의 계산 및 분류를 수행하는 빅데이터 분석 엔진을 포함한다. In one aspect, the health care monitoring device using the wearable sensor and social network data proposed in the present invention is physiological data from a wearable device, drug and symptom data from a smart terminal, and application programming interfaces (APIs) through A data collection unit for collecting discussion data on social networks of patients and doctors, a data storage unit for storing collected physiological data in connection with a personal cloud server, drug and symptom data, and discussion data, and feature polarity for the stored data Includes a big data analytics engine that performs computation and classification of data including identification, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and drug side effects prediction do.

빅데이터 분석 엔진은 저장된 데이터는 센서 데이터, 의료 기록 및 소셜 네트워크 콘텐츠를 포함하고, 센서 데이터의 사전 분석, 센서 데이터의 사전 처리 및 필터링, 의료 기록의 사전 처리, 소셜 네트워크 콘텐츠의 사전 처리 및 불일치와 노이즈를 제거하기 위한 데이터 필터링을 수행한 후 데이터를 구조화된 형태로 변환한다. The big data analytics engine stores data including sensor data, medical records and social network content, and includes pre-analysis of sensor data, pre-processing and filtering of sensor data, pre-processing of medical records, pre-processing of social network content, and inconsistencies After performing data filtering to remove noise, the data is converted into a structured form.

빅데이터 분석 엔진은 단어의 의미를 이해하는 신경망 모델에 대하여 추가 정보를 제공하기 위해 온톨로지 기반 특징 추출을 수행하여 모든 온톨로지 정보를 검색하고, 분류를 위한 워드 임베딩과 통합하며, 온톨로지로부터 정보를 추출하기 위해, 용어 주파수(Term Frequency; TF), 용어 빈도 및 TF-IDF(Inverse Document Frequency)를 포함하는 통계 방법을 사용하여 수학적으로 정의한다. The big data analysis engine retrieves all ontology information by performing ontology-based feature extraction to provide additional information about the neural network model that understands the meaning of words, integrates with word embeddings for classification, and extracts information from the ontology. For this purpose, it is mathematically defined using a statistical method including Term Frequency (TF), Term Frequency, and TF-IDF (Inverse Document Frequency).

빅데이터 분석 엔진은 WEKA(Waikato Environment for Knowledge Analysis)를 통해 계산된 정보 이득(Information Gain; IG) 방법을 사용하여 웨어러블 센서 데이터의 특징을 추출하고 속성의 가치를 찾아, 불확실성을 측정하기 위해 엔트로피를 이용하고 이전의 엔트로피와 이후의 엔트로피 차이를 정의한다. The big data analysis engine uses the information gain (IG) method calculated through WEKA (Waikato Environment for Knowledge Analysis) to extract the features of the wearable sensor data, find the value of the attribute, and calculate the entropy to measure the uncertainty. and define the difference between the entropy before and after entropy.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 방법은 데이터 수집부를 통해 웨어러블 기기로부터 생리적 데이터, 스마트 단말로부터 약물 및 증상 데이터, 어플리케이션 프로그래밍 인터페이스(Application Programming Interfaces; API)를 통한 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 수집하는 단계, 데이터 저장부는 개인 클라우드 서버와 연결되어 수집된 생리적 데이터, 약물 및 증상 데이터 및 논의 데이터를 저장하는 단계, 및 저장된 데이터는 빅데이터 분석 엔진을 통해 특징 극성 식별, 워드 임베딩 및 온톨로지 기반 특징 추출, 웨어러블 센서 데이터에서 추출, 데이터 구성 요소 분석, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측을 포함하는 데이터의 계산 및 분류를 수행하는 단계를 포함한다.In another aspect, the healthcare monitoring method using the wearable sensor and social network data proposed in the present invention includes physiological data from a wearable device through a data collection unit, drug and symptom data from a smart terminal, and an application programming interface (Application Programming Interface). Interfaces; collecting discussion data on social networks of patients and doctors through API), the data storage unit is connected to a personal cloud server to store collected physiological data, drug and symptom data, and discussion data, and Data is analyzed through big data analysis engine to identify feature polarity, word embedding and ontology-based feature extraction, extract from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and calculation and prediction of drug side effects. performing classification.

본 발명의 실시예들에 따르면 제안된 빅데이터 분석 엔진을 통해 데이터 마이닝 기술, 온톨로지 및 양방향 장단기 메모리(Bi-LSTM)를 기반하여, 데이터 마이닝 기술을 통해 의료 데이터를 효율적으로 사전 처리하고 유용한 기능을 추출하며 데이터의 차원수를 줄일 수 있다. 또한, 제안된 온톨로지에서는 실체 및 측면에 대한 의미론적 지식과 당뇨병 및 혈압 영역에서의 관계를 제공하며, Bi-LSTM는 의료 데이터를 정확하게 분류하여 환자의 약물 부작용과 비정상 상태를 예측할 수 있다. 또한, 제안된 장치는 당뇨병, BP, 정신 건강 및 약물 검토와 관련된 의료 데이터를 사용하여 환자 분류에 활용되어, 제안된 모델을 통해 이질적인 데이터를 정확하게 처리하고 건강 상태 분류와 약물 부작용 예측의 정확성을 향상시킬 수 있다.According to the embodiments of the present invention, medical data can be efficiently pre-processed through data mining technology based on data mining technology, ontology, and bi-directional long-term memory (Bi-LSTM) through the proposed big data analysis engine and useful functions are provided. It can be extracted and reduced the number of dimensions of the data. In addition, the proposed ontology provides semantic knowledge about entities and aspects and relationships in the diabetes and blood pressure domains, and Bi-LSTM can accurately classify medical data to predict drug side effects and abnormal states of patients. In addition, the proposed device is utilized for patient classification using medical data related to diabetes, BP, mental health and drug review, to accurately process heterogeneous data through the proposed model and improve the accuracy of health status classification and drug side effects prediction can do it

도 1은 본 발명의 일 실시예에 따른 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 장치의 구성을 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 빅데이터 클라우드 서버 및 HDFS를 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 빅데이터 분석 엔진을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 임베딩 및 온톨로지 기반 특징 추출을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 LSTM 기반 분류 및 예측을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 방법을 설명하기 위한 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 온톨로지 사용에 따른 LSTM 기반 분류의 정확성과 MAE를 나타내는 그래프이다.
도 8은 본 발명의 일 실시예에 따른 PCA 및 IG와 함께 분류 모델을 사용하여 얻은 결과를 나타내는 그래프이다. 1 is a diagram showing the configuration of a healthcare monitoring device using a wearable sensor and social network data according to an embodiment of the present invention.
2 is a diagram for explaining a big data cloud server and HDFS according to an embodiment of the present invention.
3 is a diagram for explaining a big data analysis engine according to an embodiment of the present invention.
4 is a diagram for explaining embedding and ontology-based feature extraction according to an embodiment of the present invention.
5 is a diagram for explaining LSTM-based classification and prediction according to an embodiment of the present invention.
6 is a flowchart illustrating a healthcare monitoring method using a wearable sensor and social network data according to an embodiment of the present invention.
7 is a graph showing the accuracy and MAE of LSTM-based classification according to the use of ontology according to an embodiment of the present invention.
8 is a graph showing results obtained by using a classification model together with PCA and IG according to an embodiment of the present invention.

이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 장치의 구성을 나타내는 도면이다. 1 is a diagram showing the configuration of a healthcare monitoring device using a wearable sensor and social network data according to an embodiment of the present invention.

제안하는 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 장치는 데이터 수집부(120), 데이터 저장부(130), 빅데이터 분석 엔진(140)을 포함한다. The proposed healthcare monitoring device using the wearable sensor and social network data includes a data collection unit 120 , a data storage unit 130 , and a big data analysis engine 140 .

데이터 수집부(120)는 웨어러블 기기로부터 생리적 데이터, 스마트 단말로부터 약물 및 증상 데이터, 어플리케이션 프로그래밍 인터페이스(Application Programming Interfaces; API)를 통한 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 수집한다. The data collection unit 120 collects physiological data from a wearable device, drug and symptom data from a smart terminal, and discussion data of patients and doctors on social networks through application programming interfaces (APIs).

예를 들어, 데이터 수집부(120)는 센서(111), 병원의료 기록(1120), 병원 SNS(113) 및 환자 SNS(114)를 포함하는 데이터 소스(110)로부터 데이터를 수집할 수 있다. 다시 말해, 웨어러블 기기, 스마트폰에서의 약물 및 증상 정보, 애플리케이션 프로그래밍 인터페이스(API)를 통한 환자 및 의사들의 소셜 네트워크 논의에서 생리적 정보를 수집한다. For example, the data collection unit 120 may collect data from the data source 110 including the sensor 111 , the hospital medical record 1120 , the hospital SNS 113 , and the patient SNS 114 . In other words, it collects physiological information from wearable devices, drug and symptom information from smartphones, and social network discussions of patients and doctors through application programming interfaces (APIs).

웨어러블 기기는 혈압 모니터, 글루코미터 센서, 스마트워치, 맥박산소미터, 가속도계, 온도 센서, 체중계 등 스마트 센서와 웨어러블 장치를 사용하여 혈압, 혈당 레벨, 심장 박동수, 스트레스 비율, 산소 포화도, 온도 및 체중 등 환자의 생리적 징후를 실시간으로 모니터링할 수 있다. Wearable devices use smart sensors such as blood pressure monitors, glucometer sensors, smart watches, pulse oximeters, accelerometers, temperature sensors, and weight scales and wearable devices to measure blood pressure, blood sugar level, heart rate, stress ratio, oxygen saturation, temperature and weight. The physiological signs of the patient can be monitored in real time.

스마트폰은 당뇨병과 BP 환자로부터 식이요법, 운동, 기타 활동 정보를 수집하고, 개인 정보(예를 들어, 나이, 성별, 키, 기타 정보)를 수집하는 데 사용된다. Smartphones are used to collect dietary, exercise and other activity information from patients with diabetes and BP, and to collect personal information (eg age, gender, height, other information).

의료기록은 환자의 병력(예를 들어, 치료, 실험실 테스트, 약물 섭취)을 기재한 환자의 의료기록을 포함한다. Medical records include a patient's medical record that describes the patient's medical history (eg, treatment, laboratory tests, drug intake).

소셜 네트워킹 플랫폼 및 웹 페이지의 데이터는 약품 리뷰나 환자의 감정적 게시물 같은 소셜 미디어 데이터를 수집하여 스트레스와 우울증 정도를 예측하고, 식생활과 생활양식의 맥락에서 당뇨병 약물의 부작용을 파악하고, 환자의 치료와 지식을 향상시킨다.Data from social networking platforms and web pages collect social media data, such as drug reviews or patient emotional posts, to predict levels of stress and depression, identify side effects of diabetes medications in the context of diet and lifestyle, and improve patient care and intervention. Improve your knowledge.

데이터 저장부(130)는 이와 같이 개인 클라우드 서버와 연결되어 수집된 생리적 데이터, 약물 및 증상 데이터 및 논의 데이터를 저장한다. The data storage unit 130 is connected to the personal cloud server in this way and stores the collected physiological data, drug and symptom data, and discussion data.

앞서 설명된 바와 같이, 다양한 출처로부터 수집된 대량의 의료 데이터는 저장 및 처리가 어렵다. 따라서 제안된 장치는 환자 정보를 저장하기 위해 확장성이 뛰어나고 보안이 뛰어난 Amazon S3라는 개인 클라우드 서버와 연결된다. 도 2를 참조하여 데이터 저장 과정을 더욱 상세히 설명한다. As described above, large amounts of medical data collected from various sources are difficult to store and process. Therefore, the proposed device connects to a private cloud server called Amazon S3, which is highly scalable and highly secure, to store patient information. A data storage process will be described in more detail with reference to FIG. 2 .

도 2는 본 발명의 일 실시예에 따른 빅데이터 클라우드 서버 및 HDFS를 설명하기 위한 도면이다. 2 is a diagram for explaining a big data cloud server and HDFS according to an embodiment of the present invention.

도 2와 같이 웨어러블 센서와 소셜 네트워킹 데이터는 클라우드 서버(210)에서 HDFS(Hadoop Distributed File System)(232)으로 전송되고 있다. HDFS(232)는 데이터를 HBase(231) 클러스터로 전송하기 위해 높은 대역폭을 제공하는 분산 파일 스토리지 시스템이다. HBase(231)는 HDFS(232) 위에서 실행된다. Flume(220)은 클라우드 서버에서 Hadoop 에코시스템으로 데이터를 전송한다. As shown in FIG. 2 , the wearable sensor and social networking data are transmitted from the cloud server 210 to the Hadoop Distributed File System (HDFS) 232 . HDFS (232) is a distributed file storage system that provides high bandwidth to transfer data to the HBase (231) cluster. HBase 231 runs on top of HDFS 232 . Flume 220 transfers data from the cloud server to the Hadoop ecosystem.

본발명의 실시예에 따르면, 웨어러블 기기에 의해 추출된 데이터를 MapReduce(242)로 전송하기 위해 Apache Pig(242)를 사용한다. MapReduce(242)는 두 가지 주요 기능을 가지고 있다. 즉, 맵(Map)과 축소(Reduce)이다. 맵(Map) 태스크는 한 쌍으로 된 키 값의 빅데이터에서 값을 수집한다. 축소(Reduce) 기능은 특정 키에 대해 설정된 값을 저장한다. 이후, 저장된 데이터는 분석 엔진(Analytics Engine)(250)으로 전달된다. According to an embodiment of the present invention, Apache Pig 242 is used to transmit the data extracted by the wearable device to the MapReduce 242. MapReduce(242) has two main functions. That is, Map and Reduce. The Map task collects values from big data of a pair of key values. The Reduce function saves the set value for a specific key. Thereafter, the stored data is transmitted to the analysis engine (Analytics Engine) 250 .

빅데이터 분석 엔진(140)은 저장된 데이터에 대한 특징 극성 식별, 온톨로지 기반 의미론적 지식(1430으로부터 워드 임베딩 및 온톨로지 기반 특징 추출, 웨어러블 센서 데이터에서 추출, 데이터 구성 요소 분석, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측을 포함하는 데이터의 계산 및 분류를 수행한다. 저장된 데이터는 센서 데이터, 의료 기록 및 소셜 네트워킹 데이터로 구성된다. 그러나 비일관성, 누락값, 노이즈, 다른 형식, 큰 크기, 높은 차원성 때문에 실제 빅데이터를 처리하기는 극히 어렵다. Big data analysis engine 140 identifies feature polarity for stored data, ontology-based semantic knowledge (word embedding and ontology-based feature extraction from 1430), extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP Perform calculation and classification of data including classification and prediction of drug side effects.Stored data consists of sensor data, medical records and social networking data.But inconsistencies, missing values, noise, different formats, large size, high dimension Because of the nature, it is extremely difficult to process real big data.

빅데이터 분석 엔진(140)은 제안된 프레임워크의 가장 중요한 구성이다. 빅데이터 분석 엔진(140)은 데이터 계산부(141)와 데이터 분류부(142)의 두 하위 계층으로 나뉜다. 데이터 계산부(141)에는 데이터 사전 처리(141a) 데이터 사전 분석(141b), 특징 추출(141c), 워드 차원 축소(141d) 및 워드 임베딩(141e)과 같은 하위 모듈이 있다. 여기에서, 온톨로지 기반의 의미론적 지식(143)은 소프트 컴퓨팅 접근법과 함께, 필요한 정보의 추출에 필요한 데이터를 처리하고 분석하는 데 이용된다. 데이터 분류부(142)에서는 온톨로지가 있는 Bi-LSTM(142a)을 당뇨병, BP, 정신건강, 약물 부작용의 분류에 활용한다. 데이터 분류부(142)는 당뇨병과 BP에 관한 다차원 빅데이터를 지능적으로 분석하여 데이터로부터 의사 결정에 대한 통찰력을 얻고, 환자에게 맞춤형 당뇨병과 BP 의료 시스템을 제공한다. 도 3을 참조하여 데이터 분석 과정에 대해 더욱 상세히 설명한다. The big data analysis engine 140 is the most important component of the proposed framework. The big data analysis engine 140 is divided into two lower layers: a data calculation unit 141 and a data classification unit 142 . The data calculation unit 141 includes sub-modules such as data preprocessing 141a, data dictionary analysis 141b, feature extraction 141c, word dimension reduction 141d, and word embedding 141e. Here, the ontology-based semantic knowledge 143 is used to process and analyze data necessary for extraction of necessary information, together with a soft computing approach. The data classification unit 142 uses the Bi-LSTM 142a with ontology to classify diabetes, BP, mental health, and drug side effects. The data classification unit 142 intelligently analyzes multidimensional big data related to diabetes and BP to obtain insight into decision making from the data, and provides a customized diabetes and BP medical system to the patient. A data analysis process will be described in more detail with reference to FIG. 3 .

도 3은 본 발명의 일 실시예에 따른 빅데이터 분석 엔진을 설명하기 위한 도면이다. 3 is a diagram for explaining a big data analysis engine according to an embodiment of the present invention.

도 3과 같이 빅데이터 분석 엔진은 센서 데이터의 사전 분석, 센서 데이터의 사전 처리 및 필터링, 의료 기록의 사전 처리, 소셜 네트워크 콘텐츠의 사전 처리를 수행한다.3 , the big data analysis engine performs pre-analysis of sensor data, pre-processing and filtering of sensor data, pre-processing of medical records, and pre-processing of social network contents.

리소스resource IDID 파라미터parameter 설명Explanation

센서

sensor S1S1 혈당blood sugar 혈당 (mg / dL)Blood sugar (mg/dL) S2S2 체온temperature 환자의 현재 체온patient's current body temperature S3S3 혈압Blood pressure 수축기 혈압 (mmHg)Systolic blood pressure (mmHg) 이완기 혈압 (mmHg)diastolic blood pressure (mmHg) S4S4 산소 포화도oxygen saturation 환자의 SpO₂ 소비량 (mmHg)Patient SpO ₂ Consumption (mmHg) S5S5 심박수heart rate 환자의 심박수 (bpm)Patient's heart rate (bpm) S6S6 ECGECG ECG 센서를 이용한 환자의 심전도Patient's electrocardiogram using ECG sensor S7S7 EEGEEG EEG 센서를 이용한 환자 뇌파 검사Patient EEG examination using EEG sensor S8S8 스트레스stress ECG + EEG 패턴을 이용한 환자의 스트레스 계산Calculation of patient stress using ECG + EEG pattern

스마트폰

Smartphone SP1SP1 나이age 생년월일 환자의 나이date of birth patient age SP2SP2 키key 환자의 신장을 찾기 위한 검시계necropsy to find the patient's height SP3SP3 BMIBMI 체질량 지수 (kg / m²)body mass index (kg/m ² ) SP4SP4 성별gender 환자의 성별 (0/1)Patient's gender (0/1) SP6SP6 활동activity 라이프 스타일: 좌식, 가볍게 활동, 보통 활동, 활성 또는 매우 활동적인Lifestyle: sedentary, light-active, moderately active, active or very active

병원 의료 기록

hospital medical records MR1MR1 지단백질 수준lipoprotein level 저밀도 지단백질 수준 (LDL 콜레스테롤)Low-density lipoprotein levels (LDL cholesterol) 고밀도 지단백질 수준 (HDL 콜레스테롤)High-density lipoprotein levels (HDL cholesterol) MR 2MR 2 헤모글로빈hemoglobin 환자의 글리코 헤모글로빈 (A1c) (%)Patient's glycohemoglobin (A1c) (%) MR 3MR 3 혈당blood sugar 환자의 혈액 검사patient's blood test MR 4MR 4 혈청 크레아티닌serum creatinine 환자의 혈액 검사patient's blood test MR5MR5 중성 지방triglycerides 환자의 혈액 검사patient's blood test MR6MR6 콜레스테롤cholesterol 환자의 혈액 검사patient's blood test MR7MR7 AST (SGOT)AST (SGOT) 간 손상을 검사하는 혈액 검사blood tests to check for liver damage MR8MR8 ALT (SGPT)ALT (SGPT) 간 손상을 검사하는 혈액 검사blood tests to check for liver damage MR9MR9 마약 복용taking drugs 처방전 목록에서 추출한extracted from the prescription list MR10MR10 흡연smoking 예 / 아니오 (자체 시험)Yes / No (Self Test) MR11MR11 음주Drinking 예 / 아니오 (자체 시험)Yes / No (Self Test) MR12MR12 소화불량Indigestion 예 / 아니오 (자체 시험)Yes / No (Self Test) MR13MR13 가족 병력family history 예 / 아니오 (자체 시험)Yes / No (Self Test) MR14MR14 환자의 병력patient's medical history 예 / 아니오 (자체 시험)Yes / No (Self Test)

센서 데이터의 사전 분석은 표 1과 같이 센서와 스마트폰을 사용하는 당뇨병 및 BP 환자들로부터 서로 다른 파라미터가 감지된다. 또한 환자의 몸에서 다른 파라미터들을 추출한다. 첫째, 데이터 세트는 쉽게 구문 분석할 수 있도록 쉼표로 구분된 값(CSV) 파일로 변환된다. 이름은 각 속성(파라미터)에 할당되며, 숫자 값과 함께 열의 형태로 표시된다. 그런 다음 센서 데이터의 ID를 실제 센서 이름으로 표시한다(예를 들어, S1은 혈당 ID). 연령과 활동에 각각 명시적인 연도와 시간을 사용하는 대신 속성연령을 일, 월, 년으로 나누고, 활동시간을 아침, 오후, 저녁으로 나눈다. 이렇게 생성된 기능은 월의 시작인지 끝인지 또는 시작 월과 종료 월 중에 심각한 상태가 발생하는지 여부에 따라 정상적 및 비정상적인 상태의 패턴과 같이 환자 건강에 대한 통찰력을 제공한다. 이후 추가 처리를 위해 최종 데이터 세트를 Hadoop 클라우드 환경으로 업로드한다. In the preliminary analysis of sensor data, as shown in Table 1, different parameters are detected from diabetic and BP patients using sensors and smartphones. It also extracts other parameters from the patient's body. First, the data set is converted to a comma-separated values (CSV) file for easy parsing. A name is assigned to each property (parameter) and is displayed in the form of a column with a numeric value. Then, the ID of the sensor data is displayed as the actual sensor name (for example, S1 is the blood sugar ID). Instead of using explicit years and hours for age and activity, respectively, divide the accelerated age into days, months, and years, and divide activity hours into morning, afternoon, and evening. These generated features provide insight into patient health, such as patterns of normal and abnormal conditions depending on whether a serious condition occurs at the beginning or end of a month, or during a start month and end month. The final data set is then uploaded to the Hadoop cloud environment for further processing.

센서 데이터는 분석 전에 미리 처리되고 필터링된다. 불일치와 노이즈를 제거하기 위해 데이터를 필터링한다. ASCII 문자를 제거하여 데이터를 정리한다. 데이터에서 이러한 노이즈를 제거하기 위해 칼만(Kalman) 필터링이라는 잘 알려진 필터링 접근방식을 사용한다. 또한, ReplaceMissingValues라는 비지도(unsupervised) 필터는 데이터 세트에 누락된 모든 숫자 값을 사용 가능한 데이터의 수단과 모드로 대체하기 위해 사용된다. 쓸모 없는 속성은 RemoveUseless 필터라는 비지도 필터를 사용하여 최대 90%의 분산으로 제거된다. 그런 다음 숫자 값은 정규화 필터를 사용하여 정규화되며, 모든 분류에 대해 0과 1 사이로 제한된다. 이러한 단계가 끝나면 EmEditor를 활용하여 데이터 세트를 n개의 데이터 파일로 나누고, 추가 처리를 위해 Hadoop 클라우드 환경으로 업로드한다. The sensor data is pre-processed and filtered prior to analysis. Filter the data to remove inconsistencies and noise. Clean up data by removing ASCII characters. To remove this noise from the data, we use a well-known filtering approach called Kalman filtering. Also, an unsupervised filter called ReplaceMissingValues is used to replace all missing numeric values in the data set with the means and modes of available data. Useless attributes are removed with a variance of up to 90% using an unsupervised filter called RemoveUseless filter. The numeric values are then normalized using a normalization filter, constrained to between 0 and 1 for all classifications. After these steps, we utilize EmEditor to split the data set into n data files and upload them to the Hadoop cloud environment for further processing.

의료기록(Medical Record; MR)의 사전 처리에 있어서, 진단을 위해 처방된 의약품 데이터에 포함되어 있는 약물에 대한 정보를 이용한다. MR의 특성 중 일부는 환자 분류에 사용될 수 있다. 따라서 데이터 분석을 위해 MR의 각 속성에 ID와 기준 값(0, 1, 2)을 할당한다. 기준치 0, 1, 2는 환자의 건강 상태(정상, 당뇨병 전, 당뇨병)를 나타낸다. 또한 센서 기반 생성 데이터의 한계를 극복하기 위해 MR 데이터를 사용하기도 한다. 예를 들어, 누락된 값을 데이터 세트의 현재 MR 속성 값으로 대체한다.In the pre-processing of the Medical Record (MR), information on drugs included in the prescribed drug data for diagnosis is used. Some of the characteristics of MR can be used to classify patients. Therefore, for data analysis, ID and reference values (0, 1, 2) are assigned to each attribute of MR. The reference values 0, 1, and 2 represent the patient's health status (normal, pre-diabetes, diabetes). MR data is also used to overcome the limitations of sensor-based generated data. For example, replace missing values with the current MR attribute values in the data set.

소셜 네트워크 콘텐츠의 사전 처리에 있어서, 다음 단계를 적용하여 데이터를 구조화된 형태로 변환한다. 또한 이러한 단계는 유용한 데이터를 제거하므로 특징과 의견 단어를 쉽게 추출하는 데 도움이 된다.In the pre-processing of social network content, the following steps are applied to transform the data into a structured form. These steps also remove useful data, which helps to easily extract features and opinion words.

먼저, 단어 제거 중지를 수행한다. 전치사(to, in, and of), 모든 관사(a, an, and the), 기호(#, @ 등), 코퍼스(corpus, 말뭉치)데이터의 URL과 같은 단어는 문서의 의미를 방해하지 않는다. 그러나 텍스트 분류의 정확성은 감소한다. 본 발명의 실시예에 따른 레인보우(Rainbow)라는 잘 알려진 방법을 사용하여 이 내용을 삭제한다. First, stop word removal is performed. Words such as prepositions (to, in, and of), all articles (a, an, and the), symbols (#, @, etc.), and URLs of corpus data do not interfere with the meaning of the document. However, the accuracy of text classification decreases. This content is deleted using a well-known method called Rainbow according to an embodiment of the present invention.

이후, 토큰화 단계에서 흰 공간과 구분자를 제거하여 코퍼스의 복잡한 텍스트를 작은 용어나 토큰으로 구분한다. n-gram 토큰라이저를 적용하여 흰 공간과 구분자를 삭제한다. 그런 다음 출력은 추출된 단어의 음성(PoS) 태깅 및 표제어와 같은 추가 분석을 위해 저장된다.Then, in the tokenization stage, white spaces and delimiters are removed to separate complex texts in the corpus into small terms or tokens. Apply an n-gram tokenizer to remove white spaces and delimiters. The output is then stored for further analysis such as speech (PoS) tagging of the extracted words and headwords.

PoS 태깅 단계에서, 코퍼스 텍스트를 문장으로 나눈 다음, POS 태깅에 CoreNLP(Stanford Core Natural Language Processing)를 사용한다. In the PoS tagging step, the corpus text is divided into sentences, and then Stanford Core Natural Language Processing (CoreNLP) is used for POS tagging.

어간 추출(stemming) 및 표제어 추출(lemmatization) 단계에서, 본 발명은 어간 추출을 위해 접미사 드로핑(dropping) 알고리즘을 적용한다. 표제어 추출은 본문에서 사용되는 단어들의 표제어를 표현한다. 표제어 추출 후, 각 단어의 어휘 정보를 쉽게 얻는다. 예를 들어, 혈당(blood sugar)은 혈당(blood glucose)와 관련이 있다. 따라서, 어간과 표제어 단어는 추가 처리를 위해 활용된다. In the stemming and lemmatization steps, the present invention applies a suffix dropping algorithm for stem extraction. Headword extraction expresses the headings of words used in the text. After extracting the headword, it is easy to obtain the lexical information of each word. For example, blood sugar is related to blood glucose. Thus, the stem and headword words are utilized for further processing.

특징 변환 단계에서, 환자는 분류기 결과에 영향을 미치는 SNS에서 특이한 단어(예를 들어, depressssssed)를 사용한다. 따라서 두 번 이상 나타나는 일련의 문자를 일반적인 단어로 변환한다(예를 들어, depressssed는 depressed(우울해지다)가 된다). In the feature transformation phase, the patient uses unusual words (eg, depressssssed) in the SNS that affect the classifier result. Therefore, a series of characters appearing more than once is converted to a general word (for example, depressssed becomes depressed).

다음으로, 특징 극성 식별을 수행한다. 제안된 장치는 소셜 네트워크에 게시된 내용을 사용하여 환자의 스트레스와 우울증을 감지한다. 또한, 당뇨병 치료제의 효율성과 부작용에 대한 그들의 의견을 파악하기 위해 약물 검토에 관한 여러 가지 업무를 수행한다. 앞에서 언급한 두 가지 과제에 대해 정서 분석 접근법을 사용한다. 따라서 정서 구분을 위한 특징 극성과 문서 라벨링을 찾는 것이 중요하다. 소셜 네트워크와 웹 페이지 콘텐츠를 사전 처리한 후 SWN(SentiWordNet)을 사용하여 특징 의견 단어 극성을 파악한다. 그런 다음 전체 문서의 극성을 찾기 위해 특징 극성의 결과가 누적된다. 이후, 워드 임베딩(embedding) 및 온톨로지 기반 특징 추출을 수행한다. 도 4를 참조하여 임베딩 및 온톨로지 기반 특징 추출 과정에 대하여 더욱 상세히 설명한다. Next, feature polarity identification is performed. The proposed device uses content posted on social networks to detect stress and depression in patients. They also undertake several drug reviews to understand their opinions on the effectiveness and side effects of diabetes medications. A sentiment analysis approach is used for the two previously mentioned tasks. Therefore, it is important to find feature polarity and document labeling for sentiment classification. After preprocessing social networks and web page content, SentiWordNet (SWN) is used to determine feature opinion word polarity. Then, the result of feature polarity is accumulated to find the polarity of the entire document. Then, word embedding and ontology-based feature extraction are performed. An embedding and ontology-based feature extraction process will be described in more detail with reference to FIG. 4 .

도 4는 본 발명의 일 실시예에 따른 임베딩 및 온톨로지 기반 특징 추출을 설명하기 위한 도면이다. 4 is a diagram for explaining embedding and ontology-based feature extraction according to an embodiment of the present invention.

도 4와 같이 데이터 세트에 200차원의 벡터가 있는 Word2vec의 건너뛰기 프로그램 skip-gram 모델을 훈련시켰다. 이 Word2vec 모델은 형상들 사이의 연관성을 탐지하기 위해 ML 분류기를 훈련시키는데 이용된다. 일반적으로 ML 분류기는 특정 도메인에 따라 특징의 기본 의미론을 놓치게 된다. 온톨로지는 경우에 따라 주어진 도메인의 의미론을 나타낼 수 있다. 온라인에서 이용할 수 있는 생물의학 온톨로지는 당뇨병, 우울증, 고혈압, 약물, 그리고 음식에 관한 다양한 주제를 다룬다. As shown in Figure 4, a skip-gram model of Word2vec's skip program with 200-dimensional vectors in the data set was trained. This Word2vec model is used to train an ML classifier to detect associations between features. In general, ML classifiers miss the basic semantics of features depending on a particular domain. An ontology may indicate the semantics of a given domain in some cases. Biomedical Ontologies, available online, cover a variety of topics related to diabetes, depression, high blood pressure, medications, and food.

온톨로지 및 약어Ontologies and Abbreviations 정의Justice 클래스 수number of classes 특성 수number of features 개인 수number of individuals 영양학 온톨로지 (ONS)Nutrition Ontology (ONS) 복합 영양 연구에 대한 설명을 제공합니다.Provides a description of complex nutrition research. 34423442 6666 104104 BioMedBridges 당뇨병 온톨로지 (DIAB)BioMedBridges Diabetes Ontology (DIAB) 텍스트 마이닝을위한 당뇨병 표현형 간의 관계를 나타냅니다.Represents the relationship between diabetes phenotypes for text mining. 375375 44 00 당뇨병 치료 온톨로지 (DMTO)Diabetes Treatment Ontology (DMTO) 당뇨병 치료를 위한 상호 운용성을 제공합니다.Provides interoperability for the treatment of diabetes. 1070010700 315315 6363 인간 질병 온톨로지 (DOID)Human Disease Ontology (DOID) 그것은 드문 질병의 개념을 나타냅니다.It represents the concept of a rare disease. 1269412694 1515 00 약물 목표 온톨로지 (DTO)Drug Target Ontology (DTO) 그것은 약물 목표 데이터의 분류에 대한 정보를 제공합니다.It provides information on the classification of drug target data. 1007510075 00 00 FHIR 및 SSN 기반 유형 1 당뇨병 온톨로지 (FASTO)FHIR and SSN-based Type 1 Diabetes Ontology (FASTO) 당뇨병 환자를 위한 인슐린 관리 정보를 제공합니다.Insulin management information for people with diabetes. 95779577 822822 460460

표 2와 같이 개념과 관계가 많은 영양학 연구용 온톨로지(ONS), BioMedBridges 당뇨 온톨로지(DIAB), 당뇨병 Mellitus 치료 온톨로지(DMTO), 인간 질병 온톨로지(DOID), 약물 표적 온톨로지(DTO), 패스트 헬스케어 상호운용성 자원(FHIR), 의미 센서 네트워크(SSN) 기반 타입1의 당뇨병 온톨로지(FASTO)의 최신판을 활용한다. 이러한 온톨로지는 특이한 단어의 의미를 이해하는 신경망 모델에 대하여 추가 정보를 제공한다.As shown in Table 2, Nutrition Research Ontology (ONS), BioMedBridges Diabetes Ontology (DIAB), Diabetes Mellitus Treatment Ontology (DMTO), Human Disease Ontology (DOID), Drug Target Ontology (DTO), Fast Healthcare Interoperability Utilizes the latest edition of Type 1 Diabetes Ontology (FASTO) based on Resource (FHIR) and Semantic Sensor Network (SSN). These ontology provide additional information for neural network models that understand the meaning of unusual words.

이것은 모든 온톨로지 정보를 검색하고 분류기를 위한 단어 임베딩과 통합하려고 시도한다. 각각의 온톨로지를 BOW(Bag-of-Words)로 생각한다. 온톨로지로부터 정보를 추출하기 위해, 용어 주파수(TF), 용어 빈도 및 TF-IDF(Inverse Document Frequency)라고 하는 잘 알려진 통계 방법을 사용한다. 여기서는 온톨로지의 각 개념이나 특성을 용어로서, 온톨로지를 문서로서 고찰한다. 따라서 TF는 온톨로지(ontology)에서 발견되는 용어(주변어)이다. TF는 다음 수학식에서 수학적으로 정의된다.It retrieves all ontology information and attempts to integrate it with word embeddings for the classifier. Think of each ontology as BOW (Bag-of-Words). To extract information from an ontology, well-known statistical methods called term frequency (TF), term frequency and TF-IDF (Inverse Document Frequency) are used. Here, each concept or characteristic of ontology is considered as a term, and ontology is considered as a document. Therefore, TF is a term (peripheral word) found in ontology. TF is mathematically defined in the following equation.

(1)

(One)

TF(Term,Onto) 값이 0보다 클 경우 이 단계를 반복하여 특정 개념을 추출한다. TF-IDF는 정보 추출에 대한 예시용어를 선택한다. IDF는 주로 모든 온톨로지(예를 들어, 환자, 질병, 병원 등)에 나타나는 단어의 의미를 감소시킨다. 용어가 더 많은 온톨로지 또는 단일 온톨로지의 더 많은 개념에서 발생하는 경우, 그것은 정규 용어라는 것을 의미하며 정보 추출에 필요한 용어가 아닐 수 있다. 따라서 입력 단어의 로그 함수의 결과는 0으로 감소할 것이다. 이는 TF-IDF의 가치가 이 용어에 비해 작다는 것을 보여준다. IDF의 통계적 설명은 다음 수학식에 나타나 있다.If the TF(Term, Onto) value is greater than 0, repeat this step to extract a specific concept. The TF-IDF selects an example term for information extraction. IDF mainly reduces the meaning of words appearing in all ontology (eg, patient, disease, hospital, etc.). If a term occurs in more ontologies or more concepts in a single ontology, it means that it is a regular term and may not be a term necessary for information extraction. Therefore, the result of the log function of the input word will be reduced to zero. This shows that the value of TF-IDF is small compared to this term. The statistical description of IDF is shown in the following equation.

(2)

여기서

는 데이터베이스의 전체 온톨로지 수 또는 온톨로지 내의 총 개념 수를 표시한다(예를 들어,

및 |{Onto∈Ontos:Term ∈Onto}|는 용어가 나타나는 온톨로지 내의 온톨로지 또는 개념의 수이다). 다음 TF-IDF 수학식을 사용하여 공통 용어를 제거한다.here

indicates the total number of ontology in the database or the total number of concepts in the ontology (for example,

and |{Onto∈Ontos:Term ∈Onto}| is the number of ontology or concept in the ontology in which the term appears). The following TF-IDF equations are used to remove common terms.

(3)

TF-IDF 결과는 온톨로지 코퍼스에서 온톨로지 기능이 온톨로지에게 얼마나 필수적인지를 보여주거나 온톨로지에서의 개념에 온톨로지 기능이 얼마나 중요한지를 보여준다. The TF-IDF result shows how essential the ontology function is to the ontology in the ontology corpus or how important the ontology function is to the concept in the ontology.

다음으로, 웨어러블 센서 데이터에서 추출하는 특징에 대하여 설명한다. 특징 추출에 정보 이득(IG) 방법을 사용하는데, 이는 소음과 관련 없는 특징을 줄임으로써 분류기에 영향을 미친다. WEKA(Waikato Environment for Knowledge Analysis)를 활용하여 정보 이득을 계산한다. 당뇨병 및 BP 데이터 세트에 대한 평가자로 정보 이득 필터 "infoGainAtrributeVal"을 적용하여 결과를 얻는다. 속성 값이 식별되면, IG 측정은 훈련 데이터 세트의 엔트로피 감소와 연계된다. 이 접근법은 분류에 따라 IG를 계산하여 속성의 가치를 찾는다. 제안된 IG 방법은 시스템 불확실성을 측정하기 위해 엔트로피를 이용하고 이전의 엔트로피와 이후의 엔트로피 차이를 발견한다. 그것은 수학식 4와 같이 B가 제공하는 A에 대한 추가 정보의 양을 명시한다.Next, features extracted from wearable sensor data will be described. We use the information gain (IG) method for feature extraction, which affects the classifier by reducing features that are not related to noise. The information gain is calculated using Waikato Environment for Knowledge Analysis (WEKA). The results are obtained by applying the information gain filter "infoGainAtrributeVal" as an evaluator for the diabetes and BP data sets. Once the attribute values are identified, the IG measurement is associated with a reduction in entropy of the training data set. This approach finds the value of an attribute by calculating the IG according to its classification. The proposed IG method uses entropy to measure system uncertainty and finds the difference between the previous entropy and the subsequent entropy. It specifies the amount of additional information about A provided by B as in equation (4).

(4)

여기서 A와 B는 별개의 변수이다. A는 특징이며, 이전의 엔트로피는 수학식 5를 사용하여 측정할 수 있다.Here, A and B are separate variables. A is the feature, and the previous entropy can be measured using Equation (5).

(5)

여기서

는

의 이산 값에 대한 사전 확률을 나타낸다. A의 조건부 엔트로피는, 이후의 엔트로피 B가 주어진 후에, 수학식 6과 7에서와 같이 정의될 수 있다.here

Is

Represents the prior probabilities for the discrete values of . The conditional entropy of A can be defined as in Equations 6 and 7 after the entropy B is given.

(6)

(7)

정보 이득은 수학식 8과 같이 수학식 5와 7을 수학식 4에 넣어 계산할 수 있다.The information gain can be calculated by putting Equations 5 and 7 into Equation 4 as in Equation 8.

(8)

다음으로, 주요 구성 요소를 분석한다. 데이터 세트의 차원수는 사전 처리 및 추가 속성 생성 후에 증가한다. 이는 분류 정확도, 적합도 및 시간 복잡성의 감소와 같은 문제를 야기한다. 따라서, 차원 감소에 대한 통계적 접근방식인 주성분 분석(PCA)을 사용한다. PCA는 p차원의 X 데이터를 가장 적은 손실로 q차원의 Y 데이터로 변환한다. Next, the main components are analyzed. The number of dimensions in the data set increases after pre-processing and creation of additional attributes. This causes problems such as reduction of classification accuracy, goodness of fit, and time complexity. Therefore, we use principal component analysis (PCA), a statistical approach to dimensionality reduction. PCA converts p-dimensional X data into q-dimensional Y data with the least loss.

이후, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용을 예측한다. Bi-LSTM는 환자의 당뇨병, BP, 정신건강, 당뇨병 약물의 부작용 등을 분류하는 데 쓰인다. Then, Bi-LSTM-based diabetes and BP classification and drug side effects are predicted. Bi-LSTM is used to classify patients' diabetes, BP, mental health, and side effects of diabetes medications.

도 5는 본 발명의 일 실시예에 따른 LSTM 기반 분류 및 예측을 설명하기 위한 도면이다. 5 is a diagram for explaining LSTM-based classification and prediction according to an embodiment of the present invention.

도 5는 당뇨병, BP, 정신건강, 약물 부작용의 분류를 위한 Bi-LSTM의 구조를 보여준다. 그리드 검색 최적화 알고리즘을 적용하여 LSTM 모델의 하이퍼 파라미터에 대한 최적의 값을 식별한다. 하이퍼 파라미터 드롭아웃 속도, 에폭(epochs), 배치 크기 및 학습 속도에 대해 선택된 최적 값은 각각 0.3, 30, 32, 0.001이다. 도 5와 같이, LSTM은 네 가지 다른 유형의 데이터 세트를 분류하는 데 사용된다. 당뇨병 분류를 위해, Pima Indians Diabaetes 데이터 세트는 UCI 기계 학습 저장소에서 수집되었다. 데이터 세트에는 8개의 입력 속성이 있다. 그러나 당뇨병 분류 모형을 훈련하는 데는 6가지 속성만 사용된다. 이러한 속성은 연령, BMI, BP, 혈장 포도당, 당뇨병 혈통 기능, 2시간 혈청 인슐린 등이다. 집중 치료 II(MIMIC-II) 데이터베이스의 PhysioNet 다중 파라미터 지능형 모니터링 데이터 세트를 사용하여 BP 분류 모델을 교육했다. 이 데이터 집합은 BP와 심박수(HR)를 포함한 다수의 활력징후 샘플로 구성된다. 약물 부작용 예측을 위한 데이터 세트는 UCI 저장소에서 획득했다. 이 데이터 집합은 6개의 속성으로 구성되어 있다. 그러나 제안된 작업에서는 두 가지 속성(약물과 환자 리뷰 이름)만 고려된다. 본 발명에서는 LSTM^a , LSTM^b , LSTM^c , LSTM^d 의 네 가지 Bi-LSTM 기반 분류기 모델을 개발했다.5 shows the structure of Bi-LSTM for classification of diabetes, BP, mental health, and drug side effects. Apply the grid search optimization algorithm to identify optimal values for the hyperparameters of the LSTM model. The optimal values chosen for hyperparameter dropout rate, epochs, batch size and learning rate are 0.3, 30, 32, and 0.001, respectively. As shown in Figure 5, LSTM is used to classify four different types of data sets. For diabetes classification, the Pima Indians Diabaetes dataset was collected from the UCI machine learning repository. The data set has eight input attributes. However, only six attributes are used to train the diabetes classification model. These attributes are age, BMI, BP, plasma glucose, diabetic lineage function, and 2-hour serum insulin. A BP classification model was trained using the PhysioNet multi-parameter intelligent monitoring dataset from the Intensive Care II (MIMIC-II) database. This data set consists of multiple samples of vital signs, including BP and heart rate (HR). Data sets for predicting drug side effects were obtained from the UCI repository. This data set consists of six properties. However, only two attributes are considered in the proposed work: drug and patient review name. In the present invention, we developed four Bi-LSTM-based classifier models: ^{LSTM a} , LSTM ^b , LSTM ^c , and LSTM ^d.

당뇨병 분류에는 나이, 가족, 성별, 활동, BMI, 혈압, 혈당 기능이 사용된다. 각 타임스텝, t, 특성 Xt를 LSTM^a 단위로 입력하여 당뇨병 등급 {정상, 당뇨병 전, 당뇨병}을 예측한다. 당뇨병 및 혈압 환자를 분류하는 의료규칙은 표 3에 제시되어 있다.Diabetes classification uses age, family, gender, activity, BMI, blood pressure, and blood glucose functions. Enter each time step, t, and characteristic Xt in LSTM ^a units to predict diabetes grade {normal, pre-diabetic, diabetic}. The medical rules for classifying diabetic and blood pressure patients are presented in Table 3.

혈압과 심전도 검사에서 웨어러블 센서는 각 환자의 BP와 심장 박수를 식별하는 데 사용된다. 추출된 정보는 특징(형상) 구성에 사용된다. 특징을 생성 및 사전 처리한 후, LSTM^b 라는 이름의 Bi-LSTM 기반의 모델을 개발했는데, 이 모델은 과거 시퀀싱된 데이터에 존재하는 시간 패턴을 유지할 수 있다. BP의 분류를 위해 각 단계 t에서 생성된 모든 파라미터와 특징을 카테고리 벡터 {0, 1, 2}로 변환한다.In blood pressure and ECG tests, wearable sensors are used to identify each patient's BP and heart rate. The extracted information is used for feature (shape) construction. After feature generation and preprocessing, ^{we developed a Bi-LSTM-based model named LSTM b} , which can maintain temporal patterns present in past sequenced data. For classification of BP, all parameters and features generated in each step t are converted into category vectors {0, 1, 2}.

정신 건강 모니터링에서 환자 메시지 및 게시물은 소셜 네트워크에서 추출되며 지도 및 비지도 접근방식을 사용하여 필터링된다. 200차원의 Word2vec 모델을 훈련시킨 다음, LSTMc 모델에 워드 시퀀스를 공급했다. 그런 다음 LSTM^c 의 출력은 소프트맥스 함수에 할당되며, 소프트맥스 함수는 게시된 내용의 각 문장에 대한 정서 라벨(양성, 중성, 음성 또는 강한 음성)을 예측한다. 예측 극성을 기준으로 환자의 정신건강을 분류하는 규칙은 표 4에 나와 있다.In mental health monitoring, patient messages and posts are extracted from social networks and filtered using supervised and unsupervised approaches. After training a 200-dimensional Word2vec model, word sequences were fed to the LSTMc model. The ^{output of LSTM c} is then assigned to a softmax function, which predicts a sentiment label (positive, neutral, negative or strong negative) for each sentence in the published content. The rules for classifying a patient's mental health based on predicted polarity are shown in Table 4.

카테고리category 환자 게시물 및 의견에 대한 감정적 인 극성Emotional polarity of patient posts and comments
(양성/중성/음성/ (positive/neutral/negative/
강한음성)strong voice) 항 당뇨병 약물에 대한 감정 극성Emotional polarity to antidiabetic drugs
(양성/중성/음성) (positive/neutral/negative)
환자 정신 건강 분류
Patient Mental Health Classification 행복happiness 양성positivity -- 정상normal 중성neutrality -- 우울depressed 음성voice -- 스트레스stress 강한 음성strong voice --
약물 부작용 예측
Predicting drug side effects 부작용 없음No side effects -- 양성positivity 중간 부작용moderate side effects -- 중성neutrality 심각한 부작용serious side effects -- 음성voice

약물 부작용 예측에서는 현재 섭취하는 당뇨병과 BP 약물을 입력 질의로 사용하고 있으며, 다른 웹사이트에서 이에 대한 리뷰를 검색했다. 필터링과 사전처리 후에, 리뷰는 자동적으로 부작용이 없고/적당한 부작용이 있고/심각한 부작용이 있다고 라벨을 붙인다. 이 라벨링에 대해 각 문장의 정서 극성을 파악한다. 그런 다음, 표 4와 같이 양성, 중성, 음성의 정서 극성은 각각 부작용이 없고, 온건한 부작용과 심각한 부작용이 없는 것으로 간주한다. 200차원 Word2vec 모델은 LSTM^d 모델에서 일련의 단어로 약물 리뷰를 나타내도록 훈련되었다. 그런 다음 LSTM^d 의 출력을 약물 부작용을 정확하게 예측하는 소프트맥스 기능에 공급한다.In the drug side effect prediction, current diabetes and BP drugs were used as input queries, and reviews were searched for on other websites. After filtering and preprocessing, the review is automatically labeled as no side effects/moderate side effects/serious side effects. For this labeling, identify the emotional polarity of each sentence. Then, as shown in Table 4, positive, neutral, and negative emotional polarities are considered to have no side effects, respectively, and no moderate side effects and no serious side effects. A 200-dimensional Word2vec model was trained to represent drug reviews as a series of words in the ^{LSTM d model.} We then ^{feed the output of LSTM d to} the softmax function that accurately predicts drug side effects.

다시 도 1을 참조하면, 헬스케어 모니터링(151) 및 추천(152)을 위한 데이터 표시부(150)을 통해 데이터의 계산 및 분류 결과에 따른 분석 데이터를 표시한다. 당뇨병 분류에서 얻은 결과는 표 5에 제시되어 있다. Referring back to FIG. 1 , analysis data according to the data calculation and classification results are displayed through the data display unit 150 for healthcare monitoring 151 and recommendation 152 . The results from the diabetes classification are presented in Table 5.

제안된 모델 및 기타 분류기Proposed model and other classifiers 정도(P) (%)Degree (P) (%) 리콜(R) (%)Recall (R) (%) 기능 측정(FM) (%)Functional Measurement (FM) (%) 정확도(Ac) (%)Accuracy (Ac) (%) RMSERMSE MAEMAE CNNCNN 6262 6666 6363 6666 5858 3434 MLPMLP 6767 6767 6767 6868 5151 3434 SVMSVM 6767 7070 6868 7070 5454 3030 퍼지 분류기Fuzzy Classifier 7373 7272 6565 7272 5252 2626 로지스틱 회귀logistic regression 7070 6868 6969 6969 5555 3232 랜덤 포레스트random forest 6767 7070 6666 7070 4646 4242 KNNKNN 5757 6565 5959 6565 5252 5050 LSTMLSTM ^aa 7474 7575 7575 7575 5050 2626

다른 분류기(CNN, MLP, SVM, 퍼지 분류기(fuzzy logic), 로지스틱 회귀(logistic regression), 랜덤 포레스트(random forest) 및 KNN)는 피마 인디언(Pima Indians) 데이터 세트를 사용하여 제안된 LSTMa 모델과 비교되었다. 그 결과 제안된 LSTM^a 는 다른 분류기와 비교하여 가장 높은 정확도(75%)와 가장 낮은 MAE(26%)를 얻었다. 이 높은 정확도는 LSTM^a 가 당뇨병 분류를 위해 기억 세포에 더 중요한 정보를 저장하고 있음을 나타낸다. 또한 최저 MAE는 LSTM^a모델의 더 나은 성능을 보여준다. 퍼지 분류기는 다른 분류기에 비해 가장 낮은 RMSE(46)를 얻었다. 이 실험을 바탕으로 다른 분류기들은 시간이 많이 소요되고, 적은 수의 특징에도 성능이 저하되는 것을 관찰한다. 그러나 LSTM은 당뇨병 예측에서 다른 분류기들을 능가했다.Other classifiers (CNN, MLP, SVM, fuzzy logic, logistic regression, random forest, and KNN) are compared with the LSTMa model proposed using the Pima Indians data set. became As a result, the proposed LSTM ^a obtained the highest accuracy (75%) and lowest MAE (26%) compared to other classifiers. This high accuracy indicates that LSTM ^a stores more important information in memory cells for diabetes classification. Also, the lowest MAE shows better performance of the ^{LSTM a model.} The fuzzy classifier obtained the lowest RMSE (46) compared to other classifiers. Based on this experiment, we observe that other classifiers are time consuming and degrade performance even with a small number of features. However, LSTM outperformed other classifiers in predicting diabetes.

표 6은 제안된 모델과 기타 분류기가 BP 분류 측면에서 얻은 결과를 보여준다. Table 6 shows the results obtained by the proposed model and other classifiers in terms of BP classification.

제안된 모델 및 기타 분류기Proposed model and other classifiers 정도(P) (%)Degree (P) (%) 리콜(R) (%)Recall (R) (%) 기능 측정(FM) (%)Functional Measurement (FM) (%) 정확도(Ac) (%)Accuracy (Ac) (%) RMSERMSE MAEMAE CNNCNN 7171 7070 6969 7070 5454 3030 MLPMLP 8080 8080 8080 8080 3737 2222 SVMSVM 8383 7575 7373 7474 5050 2525 퍼지 분류기Fuzzy Classifier 8484 8383 8383 8383 4040 1616 로지스틱 회귀logistic regression 7272 7272 7171 7272 5151 2929 랜덤 포레스트random forest 7878 7070 6767 7070 4545 4242 KNNKNN 6060 5858 5656 5858 6464 4242 LSTMLSTM ^bb 8989 8787 8787 8888 3434 1212

LSTM^b 가 CNN(70%), MLP(80%), SVM(73%), 퍼지 분류기(83%), 로지스틱 회귀(72%), 랜덤 포레스트(70%), KNN(58%)에 비해 가장 높은 정확도(88%)를 달성함을 관측했다. 그러나 제안된 모델의 MAE와 RMSE는 각각 12와 34로 다른 분류기보다 낮았다.LSTM ^b is the most effective compared to CNN (70%), MLP (80%), SVM (73%), fuzzy classifier (83%), logistic regression (72%), random forest (70%), and KNN (58%). It was observed to achieve high accuracy (88%). However, the MAE and RMSE of the proposed model were 12 and 34, respectively, which were lower than other classifiers.

표 7은 정신 건강 분류의 측면에서 분류기의 성과를 보여준다. Table 7 shows the performance of the classifier in terms of mental health classification.

제안된 모델 및 기타 분류기Proposed model and other classifiers 정도(P) (%)Degree (P) (%) 리콜(R) (%)Recall (R) (%) 기능 측정(FM) (%)Functional Measurement (FM) (%) 정확도(Ac) (%)Accuracy (Ac) (%) RMSERMSE MAEMAE CNNCNN 6666 6666 6666 6666 5656 3434 MLPMLP 7272 7171 7171 7272 4747 2929 SVMSVM 6868 6868 6868 6868 5656 3232 퍼지 분류기Fuzzy Classifier 6060 5858 5656 5959 6767 4545 로지스틱 회귀logistic regression 7070 7575 7272 7171 4444 3434 랜덤 포레스트random forest 6060 6060 5959 6060 5858 4040 KNNKNN 6464 6363 6464 6464 6060 3636 LSTMLSTM ^cc 8787 9090 8787 8989 3535 1515

표 7에 따르면, 소프트맥스를 사용하는 LSTM^c 는 다른 분류기와 비교했을 때 가장 높은 정확도(89%)를 보여준다. 또 LSTM^c 의 MAE와 RMSE는 각각 15와 35로 다른 분류기보다 낮다.According to Table 7, LSTM ^c using Softmax shows the highest accuracy (89%) compared to other classifiers. Also, ^{the MAE and RMSE of LSTM c} are 15 and 35, respectively, which are lower than other classifiers.

제안된 모델 및 기타 분류기Proposed model and other classifiers 정도(P) (%)Degree (P) (%) 리콜(R) (%)Recall (R) (%) 기능 측정(FM) (%)Functional Measurement (FM) (%) 정확도(Ac) (%)Accuracy (Ac) (%) RMSERMSE MAEMAE CNNCNN 7676 6868 6565 6868 6262 3939 MLPMLP 8282 8181 8181 8181 3939 2020 SVMSVM 8282 8282 8282 8282 3636 1818 퍼지 분류기Fuzzy Classifier 7979 7878 7878 7878 4141 2828 로지스틱 회귀logistic regression 8484 8282 8383 8383 3636 1717 랜덤 포레스트random forest 7676 6666 6363 6666 5858 3333 KNNKNN 7777 7575 7575 7676 4949 2424 LSTMLSTM ^dd 8888 9090 8989 9090 3232 1313

제안된 모델의 약물 리뷰를 이용한 분류 결과를 평가하기 위해, 표 8에서와 같이 LSTM^d 모델을 다른 분류기와 비교했고, 로지스틱 회귀 분석은 각각 90%와 83%에서 높은 정확도를 얻었다. 다른 분류기들은 LSTM^d 및 로지스틱 회귀 분석과 비교하여 더 낮은 정확도를 달성했다. LSTM^d 의 RMSE와 MAE는 각각 32와 13으로 다른 분류기보다 낮다. CNN의 정확도는 매우 낮으며, 다른 분류기들에 비해 RMSE가 매우 높다. 이는 LSTM^d 가 단어 벡터의 크기가 증가함에 따라 단어 벡터의 긴 시퀀스를 처리할 수 있음을 나타낸다. 이와는 대조적으로, CNN은 단어 순서와 단어 임베딩의 차원이 증가하면 과적합하게 된다.To evaluate the classification results using the drug review of the proposed model, the LSTM ^d model was compared with other classifiers as shown in Table 8, and logistic regression analysis obtained high accuracy in 90% and 83%, respectively. Other classifiers achieved lower accuracy compared to ^{LSTM d and logistic regression.} The RMSE and MAE of LSTM ^d are 32 and 13, respectively, which are lower than other classifiers. CNN's accuracy is very low, and its RMSE is very high compared to other classifiers. This indicates that LSTM ^d can handle long sequences of word vectors as the size of the word vectors increases. In contrast, CNNs become overfitting as the dimensionality of word order and word embeddings increases.

도 6은 본 발명의 일 실시예에 따른 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 방법을 설명하기 위한 흐름도이다. 6 is a flowchart illustrating a healthcare monitoring method using a wearable sensor and social network data according to an embodiment of the present invention.

제안하는 웨어러블 센서와 소셜네트워크 데이터를 활용한 헬스케어 모니터링 방법은 데이터 수집부를 통해 웨어러블 기기로부터 생리적 데이터, 스마트 단말로부터 약물 및 증상 데이터, 어플리케이션 프로그래밍 인터페이스(Application Programming Interfaces; API)를 통한 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 수집하는 단계(610), 데이터 저장부는 개인 클라우드 서버와 연결되어 수집된 생리적 데이터, 약물 및 증상 데이터 및 논의 데이터를 저장하는 단계(620), 저장된 데이터는 빅데이터 분석 엔진을 통해 특징 극성 식별, 워드 임베딩 및 온톨로지 기반 특징 추출, 웨어러블 센서 데이터에서 추출, 데이터 구성 요소 분석, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측을 포함하는 데이터의 계산 및 분류를 수행하는 단계(630) 및 데이터의 계산 및 분류 결과에 따는 분석 데이터를 표시하는 단계(640)를 포함한다. The proposed healthcare monitoring method using wearable sensors and social network data includes physiological data from wearable devices through the data collection unit, drug and symptom data from smart terminals, and patients and doctors through application programming interfaces (API). Collecting discussion data on their social networks (610), the data storage unit is connected to a personal cloud server to store the collected physiological data, drug and symptom data, and discussion data (620), the stored data is big data analysis Calculation and classification of data including feature polarity identification, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and drug side effects prediction through the engine and ( 630 ) and displaying ( 640 ) analysis data according to the results of data calculation and classification.

단계(610)에서, 데이터 수집부를 통해 웨어러블 기기로부터 생리적 데이터, 스마트 단말로부터 약물 및 증상 데이터, 어플리케이션 프로그래밍 인터페이스(Application Programming Interfaces; API)를 통한 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 수집한다. In step 610, physiological data from wearable devices, drug and symptom data from smart terminals, and discussion data of patients and doctors on social networks through application programming interfaces (APIs) are collected through the data collection unit. .

단계(620)에서, 데이터 저장부는 개인 클라우드 서버와 연결되어 수집된 생리적 데이터, 약물 및 증상 데이터 및 논의 데이터를 저장한다. In step 620, the data storage unit is connected to the personal cloud server to store the collected physiological data, drug and symptom data, and discussion data.

저장된 데이터는 센서 데이터, 의료 기록 및 소셜 네트워크 콘텐츠를 포함하고, 센서 데이터의 사전 분석, 센서 데이터의 사전 처리 및 필터링, 의료 기록의 사전 처리, 소셜 네트워크 콘텐츠의 사전 처리 및 불일치와 노이즈를 제거하기 위한 데이터 필터링을 수행한 후 데이터를 구조화된 형태로 변환한다. Stored data includes sensor data, medical records and social network content, for pre-analysis of sensor data, pre-processing and filtering of sensor data, pre-processing of medical records, pre-processing of social network content and for removing inconsistencies and noise After performing data filtering, the data is converted into a structured form.

단어의 의미를 이해하는 신경망 모델에 대하여 추가 정보를 제공하기 위해 온톨로지 기반 특징 추출을 수행하여 모든 온톨로지 정보를 검색하고, 분류를 위한 워드 임베딩과 통합하며, 온톨로지로부터 정보를 추출하기 위해, 용어 주파수(Term Frequency; TF), 용어 빈도 및 TF-IDF(Inverse Document Frequency)를 포함하는 통계 방법을 사용하여 수학적으로 정의한다. To provide additional information about the neural network model that understands the meaning of words, perform ontology-based feature extraction to retrieve all ontology information, integrate it with word embeddings for classification, and extract information from the ontology, term frequency ( It is defined mathematically using statistical methods including Term Frequency (TF), term frequency, and TF-IDF (Inverse Document Frequency).

WEKA(Waikato Environment for Knowledge Analysis)를 통해 계산된 정보 이득(Information Gain; IG) 방법을 사용하여 웨어러블 센서 데이터의 특징을 추출하고 속성의 가치를 찾아, 불확실성을 측정하기 위해 엔트로피를 이용하고 이전의 엔트로피와 이후의 엔트로피 차이를 정의한다. Using the information gain (IG) method calculated through Waikato Environment for Knowledge Analysis (WEKA), we extract features of wearable sensor data and find the value of the attributes, and use entropy to measure uncertainty and previously entropy and the entropy difference thereafter.

단계(630)에서, 저장된 데이터는 빅데이터 분석 엔진을 통해 특징 극성 식별, 워드 임베딩 및 온톨로지 기반 특징 추출, 웨어러블 센서 데이터에서 추출, 데이터 구성 요소 분석, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측을 포함하는 데이터의 계산 및 분류를 수행한다. In step 630, the stored data is identified through the big data analysis engine, feature polarity identification, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and drug side effects prediction Calculation and classification of data including

단계(640)에서, 데이터의 계산 및 분류 결과에 따는 분석 데이터를 표시한다. In step 640, the analysis data according to the data calculation and classification results are displayed.

도 7은 본 발명의 일 실시예에 따른 온톨로지 사용에 따른 LSTM 기반 분류의 정확성과 MAE를 나타내는 그래프이다. 7 is a graph showing the accuracy and MAE of LSTM-based classification according to the use of ontology according to an embodiment of the present invention.

도 7은 온톨로지를 사용하지 않고 / 온톨로지를 사용하여 제안된 LSTM^c 및 LSTM^d 모델의 정확성과 MAE를 나타낸다. 볼 수 있듯이, 제안된 온톨로지 LSTM는 단순한 LSTM보다 상당히 개선된 것을 보여준다. 정서문자 분류에서는 LSTM^c 의 정확도가 84%로 온톨로지 사용시 87%로 높아졌다. 하지만 MAE는 1개만 줄었다. 또한 약물 부작용 분류에서 LSTM^d 의 정확도와 MAE는 각각 90%와 14였다. 그러나 LSTM^d와 함께 온톨로지를 사용할 경우 이 정확도는 93%로 증가하였고, MAE는 1만큼 감소하였다.Fig. 7 shows the accuracy and MAE of ^{the proposed LSTM c} and LSTM ^d models without ontology/using ontology. As can be seen, the proposed ontology LSTM shows a significant improvement over the simple LSTM. In the classification of sentimental characters, ^{the accuracy of LSTM c} was increased to 84% and to 87% when the ontology was used. However, the MAE was reduced by only one. ^{In addition, the accuracy and MAE of LSTM d} were 90% and 14, respectively, in the classification of drug side effects. However, ^{when the ontology was used together with LSTM d} , the accuracy increased to 93%, and the MAE decreased by 1.

도 8은 본 발명의 일 실시예에 따른 PCA 및 IG와 함께 분류 모델을 사용하여 얻은 결과를 나타내는 그래프이다.8 is a graph showing results obtained by using a classification model together with PCA and IG according to an embodiment of the present invention.

PCA 및 IG와 함께 분류 모델을 사용하여 얻은 결과는 도 8에 나타나 있다. 도 8에서 얻은 결과를 바탕으로, PCA 및 IG를 사용한 LSTM^a, LSTM^b, 온톨로지-LSTM^c 및 온톨로지-LSTM^d 의 정확도가 각각 4%, 1%, 3%, 1% 증가했음을 주목한다. The results obtained using the classification model with PCA and IG are shown in FIG. 8 . Note that, based on the results obtained in FIG. 8 , ^{the accuracy of LSTM a} , LSTM ^b , ontology- ^{LSTM c} and ontology-LSTM ^d using PCA and IG increased by 4%, 1%, 3%, and 1%, respectively.

본 발명의 실시예들에 따르면 제안된 빅데이터 분석 엔진을 통해 데이터 마이닝 기술, 온톨로지 및 양방향 장단기 메모리(Bi-LSTM)를 기반하여, 데이터 마이닝 기술을 통해 의료 데이터를 효율적으로 사전 처리하고 유용한 기능을 추출하며 데이터의 차원수를 줄일 수 있다. 또한, 제안된 온톨로지에서는 실체 및 측면에 대한 의미론적 지식과 당뇨병 및 혈압 영역에서의 관계를 제공하며, Bi-LSTM는 의료 데이터를 정확하게 분류하여 환자의 약물 부작용과 비정상 상태를 예측할 수 있다. 또한, 제안된 장치는 당뇨병, BP, 정신 건강 및 약물 검토와 관련된 의료 데이터를 사용하여 환자 분류에 활용되어, 제안된 모델을 통해 이질적인 데이터를 정확하게 처리하고 건강 상태 분류와 약물 부작용 예측의 정확성을 향상시킬 수 있다. According to the embodiments of the present invention, medical data can be efficiently pre-processed through data mining technology based on data mining technology, ontology, and bi-directional long-term memory (Bi-LSTM) through the proposed big data analysis engine and useful functions are provided. It can be extracted and reduced the number of dimensions of the data. In addition, the proposed ontology provides semantic knowledge about entities and aspects and relationships in the diabetes and blood pressure domains, and Bi-LSTM can accurately classify medical data to predict drug side effects and abnormal states of patients. In addition, the proposed device is utilized for patient classification using medical data related to diabetes, BP, mental health and drug review, to accurately process heterogeneous data through the proposed model and improve the accuracy of health status classification and drug side effects prediction can do it

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

a data collection unit that collects physiological data from wearable devices, drug and symptom data from smart terminals, and discussion data on social networks of patients and doctors through application programming interfaces (APIs);
a data storage unit for storing collected physiological data, drug and symptom data, and discussion data in connection with a personal cloud server; and
Calculation and classification of data including feature polarity identification for stored data, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and drug side effects prediction Big data analytics engine
including,
data storage unit,
It receives wearable sensor data, social network content, and web page content from a personal cloud server and transmits it to HDFS (Hadoop Distributed File System) through Flume. It is a distributed file storage system that provides bandwidth exceeding that of HDFS for transfer, uses Apache Pig to transfer wearable sensor data, social network content, and web page content to MapReduce, and paired key values through MapReduce. After performing a map task that collects values from the big data of
Big data analysis engine,
Stored data includes wearable sensor data, medical records, and social network content, including pre-analysis of wearable sensor data, pre-processing and filtering of wearable sensor data, pre-processing of medical records, pre-processing of social network content and web page content After performing the feature opinion word polarity using SWN (SentiWordNet), the result of feature polarity is accumulated to find the polarity of the entire document, word embedding and ontology-based feature extraction are performed,
Pre-processing of medical records uses information about drugs contained in drug data prescribed for diagnosis and some of the characteristics of medical records to classify patients. For data analysis, each attribute of the medical record is assigned an ID and a reference value representing the patient's health status, and the missing sensor-based generated data values are collected using medical record data to overcome the limitations of sensor-based generated data. to be replaced with the current medical record attribute value of
Healthcare monitoring device.

delete

According to claim 1,
Big data analysis engine,
To provide additional information about the neural network model that understands the meaning of words, perform ontology-based feature extraction to retrieve all ontology information, integrate with word embeddings for classification, and extract information from the ontology, term frequency ( defined mathematically using statistical methods including Term Frequency (TF), Term Frequency, and TF-IDF (Inverse Document Frequency).
Healthcare monitoring device.

According to claim 1,
Big data analysis engine,
Using the information gain (IG) method calculated through Waikato Environment for Knowledge Analysis (WEKA), we extract the features of the wearable sensor data, find the value of the attribute, and use the entropy to measure the uncertainty and the previous entropy to define the entropy difference between
Healthcare monitoring device.

Collecting physiological data from a wearable device through a data collection unit, drug and symptom data from a smart terminal, and discussion data of patients and doctors on social networks through application programming interfaces (APIs);
The data storage unit is connected to the personal cloud server to store the collected physiological data, drug and symptom data, and discussion data; and
The stored data is analyzed through big data analysis engine to identify feature polarity, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and calculation of data including drug side effects prediction and performing classification
including,
The data storage unit is connected to the personal cloud server to store the collected physiological data, drug and symptom data, and discussion data,
It receives wearable sensor data, social network content, and web page content from a personal cloud server and transmits it to HDFS (Hadoop Distributed File System) through Flume. It is a distributed file storage system that provides bandwidth exceeding that of HDFS for transfer, uses Apache Pig to transfer wearable sensor data, social network content, and web page content to MapReduce, and paired key values through MapReduce. After performing a map task that collects values from the big data of
The stored data is analyzed through big data analysis engine to identify feature polarity, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and calculation of data including drug side effects prediction and performing the classification,
Stored data includes wearable sensor data, medical records, and social network content, including pre-analysis of wearable sensor data, pre-processing and filtering of wearable sensor data, pre-processing of medical records, pre-processing of social network content and web page content After performing the feature opinion word polarity using SWN (SentiWordNet), the result of feature polarity is accumulated to find the polarity of the entire document, word embedding and ontology-based feature extraction are performed,
Pre-processing of medical records uses information about drugs contained in drug data prescribed for diagnosis and some of the characteristics of medical records to classify patients. For data analysis, each attribute of the medical record is assigned an ID and a reference value representing the patient's health status, and the missing sensor-based generated data values are collected using medical record data to overcome the limitations of sensor-based generated data. to be replaced with the current medical record attribute value of
How to monitor healthcare.

delete

6. The method of claim 5,
The stored data is analyzed through big data analysis engine to identify feature polarity, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and calculation of data including drug side effects prediction and performing the classification,
To provide additional information about the neural network model that understands the meaning of words, perform ontology-based feature extraction to retrieve all ontology information, integrate with word embeddings for classification, and extract information from the ontology, term frequency ( defined mathematically using statistical methods including Term Frequency (TF), Term Frequency, and TF-IDF (Inverse Document Frequency).
How to monitor healthcare.

6. The method of claim 5,
The stored data is analyzed through big data analysis engine to identify feature polarity, word embedding and ontology-based feature extraction, extraction from wearable sensor data, data component analysis, Bi-LSTM-based diabetes and BP classification, and calculation of data including drug side effects prediction and performing the classification,
Using the information gain (IG) method calculated through Waikato Environment for Knowledge Analysis (WEKA), we extract the features of the wearable sensor data, find the value of the attribute, and use the entropy to measure the uncertainty and the previous entropy to define the entropy difference between
How to monitor healthcare.