KR20210004057A

KR20210004057A - Machine Learning and Semantic Knowledge-based Big Data Analysis: A Novel Healthcare Monitoring Method and Apparatus Using Wearable Sensors and Social Networking Data

Info

Publication number: KR20210004057A
Application number: KR1020190079877A
Authority: KR
Inventors: 곽경섭
Original assignee: 인하대학교 산학협력단
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2021-01-13
Also published as: KR102217307B1

Abstract

The present invention provides a novel medical monitoring method using a wearable sensor and social networking data by big data analysis based on semantic knowledge and machine learning and an apparatus thereof to improve accuracy of classification. According to the present invention, the novel medical monitoring apparatus using a wearable sensor and social networking data by big data analysis based on semantic knowledge and machine learning comprises: a data collection unit to collect data collected by a wearable device or a sensor, medical records of a patient, discussion data on social networks of patients and doctors, and data of a medical webpage; a data storage unit of a big data cloud server connected to individual cloud servers to store collected data; and a big data analysis engine to perform data analysis including pre-analysis of collected data, preprocessing and filtering of collected sensor data, preprocessing of medical records, and preprocessing of social network content, and perform feature polarity identification and document labeling. The big data analysis engine extracts word embedding and ontology-based features in text data of stored data, extracts properties from the wearable device or sensor data, performs a statistical approach for reducing dimension through principal component analysis, and predicts Bi-LSTM-based diabetes, BP classification, and medication side effects.

Description

Machine Learning and Semantic Knowledge-based Big Data Analysis: A Novel Healthcare Monitoring Method and Apparatus Using Wearable Sensors and Social Networking Data}

본 발명은 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 통해 웨어러블 센서와 소셜 네트워킹 데이터를 이용한 새로운 의료 모니터링 방법 및 장치에 관한 것이다.The present invention relates to a new medical monitoring method and apparatus using wearable sensors and social networking data through machine learning and semantic knowledge-based big data analysis.

정보 기술의 발전으로 의료 산업의 모든 도구와 기기가 디지털화되었다. 이러한 디지털화는 환자와 기술 사이의 강한 관계를 만들어냈다. 고급 기기의 주된 목적은 환자 의료 모니터링에 쉽게 활용되는 것이다. 또 스마트폰이나 웨어러블 센서 등 다양한 기기를 일상생활에 활용한다. 스마트폰은 인체 정보를 얻는 데 사용할 수 있는 센서를 포함할 수 있다. 웨어러블 기기는 방대한 양의 생리적 환자 정보를 수집하는 데 사용할 수 있다. 그러나 사회적 변화로 인해 건강한 삶에 대한 인간의 필요성은 증가하고 있다. 게다가 건강과 관련된 문제들은 심각한 것에서 만성적인 것으로 바뀌었다. 따라서 기기에서 귀중한 정보를 추출하고 데이터를 효과적으로 분석하는 것은 당뇨병 및 혈압(Blood Pressure; BP) 환자를 위한 의료 모니터링 시스템의 새로운 과제가 되었다. 한편, 이러한 기기에서 생성된 데이터는 구조화되지 않아 처리하기 어렵다. 게다가 데이터의 양이 급격히 증가하고 있어 엄청난 저장 공간이 필요하다. 따라서 당뇨병 및 BP 환자를 정확하게 모니터링하고, 의료 데이터를 저장하며, 구조화 및 비구조화 데이터에 대한 예측 분석을 수행하기 위한 스마트 방법론과 클라우드 기반 의료 아키텍처가 필요하다. With advances in information technology, all tools and devices in the medical industry have become digital. This digitization has created a strong relationship between patient and technology. The main purpose of advanced devices is to be easily used for patient medical monitoring. In addition, various devices such as smartphones and wearable sensors are used in daily life. Smartphones may include sensors that can be used to obtain human body information. Wearable devices can be used to collect vast amounts of physiological patient information. However, due to social change, the human need for a healthy life is increasing. In addition, health-related problems have shifted from serious to chronic. Therefore, extracting valuable information from devices and analyzing data effectively has become a new challenge for medical monitoring systems for patients with diabetes and blood pressure (BP). On the other hand, data generated by these devices is difficult to process because they are not structured. In addition, the amount of data is growing rapidly, requiring huge storage space. Therefore, there is a need for a smart methodology and cloud-based medical architecture to accurately monitor diabetes and BP patients, store medical data, and perform predictive analysis on structured and unstructured data.

정서적 상태와 사회로부터 오는 스트레스 등 다양한 요인들이 환자의 건강에 영향을 미칠 수 있다. 최근 의료 산업에서 소셜 네트워킹의 사용이 급속히 증가하고 있다. 당뇨병과 비정상적인 BP를 가진 사람들은 자신의 감정과 경험을 소셜네트워크서비스(Social Network Services; SNS)에서 공유한다. 그들은 귀중한 정보를 공유하고 당뇨병과 고/저 BP에 맞서 싸우도록 서로 동기를 부여한다. 또 당뇨병 환자들은 특정 약물에 대한 의견을 발표한다. 새로운 환자는 다른 사람의 의견을 보고 같은 약에 대해 반응한다. 따라서 당뇨병이나 비정상적인 BP에 대한 건강관리 모니터링 시스템은 게시물을 이용하여 환자의 정서적 장애를 파악하고 약물 리뷰를 이용하여 약물 부작용을 감시하기 위해 소셜 네트워킹 데이터가 필요하다. 그러나 환자의 감정과 약물 경험에 대한 SNS상의 정보는 비정형적이고 예기치 못한 정보로서, 환자의 정신 건강을 감시하고 약물 부작용을 예측하기 위해 의료 모니터링 시스템이 정보를 추출하여 분석하는 것은 어려운 과제가 될 것이다.Various factors such as emotional state and social stress can affect the patient's health. Recently, the use of social networking in the medical industry is increasing rapidly. People with diabetes and abnormal BP share their feelings and experiences on Social Network Services (SNS). They share valuable information and motivate each other to fight diabetes and high and low BP. In addition, diabetic patients publish their opinions on specific drugs. New patients see other people's opinions and react to the same medication. Therefore, the health management monitoring system for diabetes or abnormal BP needs social networking data to identify patients' emotional disorders using posts and monitor drug side effects using drug reviews. However, the information on SNS about the patient's emotion and drug experience is unstructured and unexpected information, and it will be a difficult task for the medical monitoring system to extract and analyze the information to monitor the patient's mental health and predict drug side effects.

최근 몇 년간 의사결정 트리, 지원 벡터 머신(Support Vector Machine; SVM), KNN(K-Nearest Neighbors), 퍼지 로직, 다중 퍼셉트론((Multilayer Perceptron; MLP)와 같은 기계 학습(Machine Learning; ML) 기술을 사용하여 당뇨병과 BP 환자를 모니터하고 적절한 치료를 제공한다. 그러나 지속적인 환자 모니터링은 센서 데이터, 환자 프로필, 의료 기록, 실험실 테스트 및 의사 노트와 같은 대량의 의료 데이터를 생성한다. 의료 및 소셜 네트워킹 데이터 모두 몇 년 사이에 크게 증가했는데 이를 빅데이터(구조화되지 않은 데이터와 구조화된 데이터 모두)라고 한다. 기존의 접근법과 ML 기법은 의미 있는 정보의 추출과 비정상적인 예측을 위해 이러한 데이터를 잘 다루지 못할 수 있다. 또한 이러한 데이터는 실시간으로 지능적으로 처리되기 전에는 의료 산업에 도움이 되지 않을 수 있다. 이를 위해서는 빅데이터 클라우드 플랫폼과 장단기 메모리(Long Short Term Memory; LSTM)와 같은 고급 심층 학습 방식이 필요하다.Machine Learning (ML) technologies such as Decision Trees, Support Vector Machines (SVMs), K-Nearest Neighbors (KNNs), fuzzy logic, and Multilayer Perceptrons (MLPs) have been used in recent years. Use to monitor diabetic and BP patients and provide appropriate treatment, but continuous patient monitoring generates large amounts of medical data such as sensor data, patient profiles, medical records, laboratory tests and doctor notes, both medical and social networking data. It has increased significantly over the years, and this is called big data (both unstructured and structured data) Traditional approaches and ML techniques may not be able to handle such data well for extracting meaningful information and making abnormal predictions. Also, such data may not be helpful to the healthcare industry until it is intelligently processed in real time, which requires a big data cloud platform and advanced deep learning methods such as Long Short Term Memory (LSTM).

의료 모니터링 시스템의 새로운 발전은 거대한 다분야의 단계가 필요하기 때문에 여전히 큰 도전 과제이다. 의료 분야의 새로운 기술과 사회의 변화로, 전통적인 시스템은 이러한 새로운 조건에 충분히 효율적이지 못하다. 문제를 해결하기 위해 새로운 방법에 기초한 새로운 틀이 필요하다. 그러나 전통적인 기법은 새로운 시스템과 함께 사용될 수 있다.New developments in medical monitoring systems are still a huge challenge as they require huge multi-disciplinary steps. With new technologies and social changes in the medical field, traditional systems are not sufficiently effective in these new conditions. A new framework based on new methods is needed to solve the problem. However, traditional techniques can be used with new systems.

현재 의료 산업의 기술은 개별 환자에 대한 생리적 정보를 수집하는 효율적인 방법을 제공하는 데 핵심적인 역할을 한다. 또한 스마트폰과 웨어러블 센서를 활용하여 환자 데이터를 수집함으로써 효율적인 의료 모니터링을 할 수 있다. 사람들은 또한 특정 약물에 대한 감정과 의견을 공유하기 위해 소셜 네트워킹을 사용하는데, 이것은 정서적 장애를 관찰하고 약물 부작용을 예측하는 데 유용하다. 그러나 지속적인 환자 모니터링은 대량의 의료 데이터를 생성하며, 소셜 네트워킹 사이트의 사용자가 생성한 의료 데이터는 대량 제공되고 구조화되지 않으며, 예기치 않은 것일 수 있다. 따라서 현재의 추세는 스마트폰, 웨어러블 센서, 의료기록, 소셜 네트워크(다시 말해, 빅데이터) 등 다양한 출처에서 추출한 방대한 양의 의료 데이터를 처리할 수 있는 고급 접근법이 필요하다. 현재의 의료 모니터링 시스템은 기기 및 소셜 네트워킹 데이터에서 중요한 정보를 추출하는 데 효율적이지 않으며, 효과적으로 분석하는 데 어려움을 겪고 있다. 또한, 기존의 기계 학습 기법은 비정상적인 예측을 위해 추출된 의료 데이터를 다루고 처리할 수 없다. 따라서, 의료 데이터를 정밀하게 저장 및 분석하고 분류 정확도를 개선하기 위해 클라우드 환경 및 빅데이터 분석 엔진을 기반으로 하는 새로운 헬스케어 모니터링 아키텍처를 제안한다.Current medical industry technology plays a key role in providing an efficient way to collect physiological information about individual patients. In addition, efficient medical monitoring can be performed by collecting patient data using smartphones and wearable sensors. People also use social networking to share feelings and opinions about certain drugs, which are useful for observing emotional disorders and predicting drug side effects. However, continuous patient monitoring generates large amounts of medical data, and medical data generated by users of social networking sites is available in large quantities, unstructured, and can be unexpected. Thus, the current trend requires advanced approaches capable of processing vast amounts of medical data extracted from various sources, including smartphones, wearable sensors, medical records, and social networks (i.e., big data). Current medical monitoring systems are not efficient in extracting important information from devices and social networking data, and are having difficulty in analyzing them effectively. Also, existing machine learning techniques cannot handle and process medical data extracted for abnormal prediction. Therefore, we propose a new healthcare monitoring architecture based on a cloud environment and big data analysis engine to accurately store and analyze medical data and improve classification accuracy.

본 발명이 이루고자 하는 기술적 과제는 의료 데이터를 정밀하게 저장 및 분석하고 분류 정확도를 개선하기 위해 클라우드 환경 및 빅데이터 분석 엔진을 기반으로 하는 새로운 헬스케어 모니터링 방법 및 장치를 제공하는데 있다. An object of the present invention is to provide a new healthcare monitoring method and apparatus based on a cloud environment and a big data analysis engine in order to accurately store and analyze medical data and improve classification accuracy.

일 측면에 있어서, 본 발명에서 제안하는 기계 학습 및 의미론적 지식 기반 빅데이터 분석으로 웨어러블 센서와 소셜 네트워킹 데이터를 이용한 새로운 의료 모니터링 장치는 웨어러블 기기 또는 센서를 통해 수집된 데이터, 환자의 의료기록, 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터, 및 의료 웹페이지의 데이터를 수집하는 데이터 수집부, 개인 클라우드 서버와 연결되어 수집된 데이터들을 저장하는 빅데이터 클라우드 서버의 데이터 저장부 및 수집된 데이터의 사전 분석, 수집된 센서 데이터의 사전 처리 및 필터링, 의료기록의 사전처리, 소셜 네트워크 컨텐츠의 사전처리를 포함하는 데이터 분석을 수행하고, 특징 극성 식별 및 문서 레이블링을 수행하는 빅데이터 분석 엔진을 포함한다. 빅데이터 분석 엔진은 저장된 데이터의 텍스트 데이터에서 워드 임베딩 및 온톨로지 기반 특징을 추출하고, 웨어러블 기기 또는 센서 데이터로부터 특성을 추출하며, 주성분 분석을 통해 차원 감소에 대한 통계적 접근을 수행하고, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용을 예측한다. In one aspect, a new medical monitoring device using wearable sensors and social networking data with machine learning and semantic knowledge-based big data analysis proposed in the present invention includes data collected through wearable devices or sensors, patient medical records, and patients. A data collection unit that collects discussion data on social networks of doctors and doctors, and data of medical web pages, a data storage unit of a big data cloud server that stores collected data connected to a personal cloud server, and a dictionary of collected data It includes a big data analysis engine that performs data analysis including analysis, pre-processing and filtering of collected sensor data, pre-processing of medical records, and pre-processing of social network content, and performing feature polarity identification and document labeling. The big data analysis engine extracts word embedding and ontology-based features from text data of stored data, extracts features from wearable devices or sensor data, performs a statistical approach to dimension reduction through principal component analysis, and is based on Bi-LSTM. Diabetes and BP classification and drug side effects are predicted.

데이터 수집부는 치료와 의료 이력을 결정하기 위해 환자의 의료기록을 수집하고, 환자의 느낌, 감정, 스트레스를 파악하기 위해 SNS에서 환자의 내용을 추출하고, 현재 의약품 섭취의 부작용을 파악하기 위해 의료 웹페이지에서 약물에 대한 환자 리뷰를 수집한다. The data collection unit collects the patient's medical records to determine the treatment and medical history, extracts the patient's contents from SNS to determine the patient's feelings, emotions, and stress, and uses the medical web to identify side effects of current drug consumption. The page collects patient reviews for drugs.

빅데이터 분석 엔진은 수집된 데이터의 사전 분석을 위해 구문 분석할 수 있도록 CSV 파일로 변환하고, 숫자 값과 함께 열의 형태로 표시한 후, 센서 데이터의 ID를 실제 센서 이름으로 표시하여, 불일치와 노이즈를 제거하기 위해 데이터를 필터링하며, 실험실 테스트, 자가 검사 답변, 복용한 약물 데이터를 포함하는 서로 다른 의료기록을 사전처리하고, 소셜 네트워크 컨텐츠에 대해 단어 제거 중지, 토큰화, PoS 태깅, 특징 변환을 포함하는 사전처리를 수행하여 데이터를 구조화된 형태로 변환한다. The big data analysis engine converts the collected data into a CSV file so that it can be parsed for preliminary analysis, displays it in the form of a column with numeric values, and then displays the ID of the sensor data as the actual sensor name, and discrepancy and noise. Filter data to remove data, preprocess different medical records, including laboratory tests, self-test answers, and drug data taken, stop word removal, tokenization, PoS tagging, and feature conversion for social network content. Converts the data into a structured form by performing pre-processing including.

빅데이터 분석 엔진은 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 사용하여 정서 분석 접근법을 통해 환자의 스트레스와 우울증을 감지하고, 당뇨병 치료제의 효율성과 부작용에 대한 의견을 파악하기 위해 환자의 약물에 대한 검토를 수행한다. The big data analysis engine uses the discussion data on the social networks of patients and doctors to detect the patient's stress and depression through a sentiment analysis approach, and to assess the patient's drugs to understand the effectiveness and side effects of diabetes treatment. To conduct a review.

빅데이터 분석 엔진은 단어를 숫자 값으로 나타내기 위해 워드 임베딩 접근법을 적용하고, 차원을 줄이기 위해 차원을 설정한 후, 설정된 차원 값을 사용하여 단어를 표현하고, Word2vec의 신경망 기반의 워드 임베딩 모델을 사용하여 단어를 표현한다. The big data analysis engine applies a word embedding approach to represent a word as a numeric value, sets a dimension to reduce the dimension, expresses the word using the set dimension value, and creates a word embedding model based on a neural network of Word2vec. To express words.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 기계 학습 및 의미론적 지식 기반 빅데이터 분석으로 웨어러블 센서와 소셜 네트워킹 데이터를 이용한 새로운 의료 모니터링 방법은 웨어러블 기기 또는 센서를 통해 수집된 데이터, 환자의 의료기록, 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터, 및 의료 웹페이지의 데이터를 수집하는 단계, 빅데이터 클라우드 서버의 데이터 저장부를 통해 개인 클라우드 서버와 연결되어 수집된 데이터들을 저장하는 단계 및 빅데이터 분석 엔진을 통해 수집된 데이터의 사전 분석, 수집된 센서 데이터의 사전 처리 및 필터링, 의료기록의 사전처리, 소셜 네트워크 컨텐츠의 사전처리를 포함하는 데이터 분석을 수행하고, 특징 극성 식별 및 문서 레이블링을 수행하는 단계를 포함하고, 빅데이터 분석 엔진을 통해 수집된 데이터의 사전 분석, 수집된 센서 데이터의 사전 처리 및 필터링, 의료기록의 사전처리, 소셜 네트워크 컨텐츠의 사전처리를 포함하는 데이터 분석을 수행하고, 특징 극성 식별 및 문서 레이블링을 수행하는 단계는, 저장된 데이터의 텍스트 데이터에서 워드 임베딩 및 온톨로지 기반 특징을 추출하고, 웨어러블 기기 또는 센서 데이터로부터 특성을 추출하며, 주성분 분석을 통해 차원 감소에 대한 통계적 접근을 수행하고, Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용을 예측한다.In another aspect, a new medical monitoring method using wearable sensors and social networking data as machine learning and semantic knowledge-based big data analysis proposed in the present invention is a data collected through a wearable device or sensor, and a patient's medical record. , Collecting discussion data on social networks of patients and doctors, and data of medical web pages, storing the collected data by connecting to a personal cloud server through the data storage unit of the big data cloud server, and analyzing big data It performs data analysis including pre-analysis of data collected through the engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and identifying feature polarity and document labeling. Including steps, performing data analysis including pre-analysis of data collected through the big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and features The step of performing polarity identification and document labeling includes extracting word embedding and ontology-based features from text data of stored data, extracting features from wearable devices or sensor data, and performing a statistical approach to dimension reduction through principal component analysis. And predicts Bi-LSTM-based diabetes and BP classification and drug side effects.

본 발명의 실시예들에 따르면 제안된 빅데이터 분석 엔진을 통해 데이터 마이닝 기술, 온톨로지 및 양방향 장단기 메모리(Bi-LSTM)를 기반하여, 데이터 마이닝 기술을 통해 의료 데이터를 효율적으로 사전 처리하고 유용한 기능을 추출하며 데이터의 차원수를 줄일 수 있다. 또한, 제안된 온톨로지에서는 실체 및 측면에 대한 의미론적 지식과 당뇨병 및 혈압 영역에서의 관계를 제공하며, Bi-LSTM는 의료 데이터를 정확하게 분류하여 환자의 약물 부작용과 비정상 상태를 예측할 수 있다. 또한, 제안된 장치는 당뇨병, BP, 정신 건강 및 약물 검토와 관련된 의료 데이터를 사용하여 환자 분류에 활용되어, 제안된 모델을 통해 이질적인 데이터를 정확하게 처리하고 건강 상태 분류와 약물 부작용 예측의 정확성을 향상시킬 수 있다. According to embodiments of the present invention, through the proposed big data analysis engine, based on data mining technology, ontology, and bidirectional long-term memory (Bi-LSTM), medical data is efficiently pre-processed and useful functions are provided through data mining technology. It can extract and reduce the number of dimensions of the data. In addition, the proposed ontology provides semantic knowledge of the substance and aspect and the relationship in the diabetes and blood pressure domains, and Bi-LSTM can accurately classify medical data to predict drug side effects and abnormal states of patients. In addition, the proposed device is utilized for patient classification using medical data related to diabetes, BP, mental health and drug review, accurately processing disparate data through the proposed model and improving the accuracy of classification of health status and prediction of drug side effects. I can make it.

도 1은 본 발명의 일 실시예에 따른 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 이용한 의료 모니터링 장치를 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 빅데이터 클라우드 서버 및 HDFS를 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 빅데이터 분석 엔진을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 임베딩 및 온톨로지 기반 특징 추출을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 LSTM 기반 분류 및 예측을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 이용한 의료 모니터링 방법을 설명하기 위한 흐름도이다.1 is a diagram illustrating a medical monitoring apparatus using machine learning and semantic knowledge-based big data analysis according to an embodiment of the present invention.
2 is a view for explaining a big data cloud server and HDFS according to an embodiment of the present invention.
3 is a diagram illustrating a big data analysis engine according to an embodiment of the present invention.
4 is a diagram for explaining embedding and ontology-based feature extraction according to an embodiment of the present invention.
5 is a diagram for describing LSTM-based classification and prediction according to an embodiment of the present invention.
6 is a flowchart illustrating a medical monitoring method using machine learning and semantic knowledge-based big data analysis according to an embodiment of the present invention.

본 발명에서는 분류 정확도를 향상시키기 위해 의료 빅데이터를 정밀하게 분석하기 위한 양방향 장단기 기억장치(Bi-LSTM)에 기반한 고급 의료 모니터링 아키텍처를 제안한다. 제안된 아키텍처는 만성 환자의 효율적인 의료 모니터링을 위해 서로 다른 정보 소스를 통합한다. 제안된 시스템은 당뇨병, BP, 정신 건강 및 약물 검토와 관련된 건강 데이터를 사용하는 환자의 분류에 적용되었다. 그 결과는 제안된 모델이 이질적인 데이터를 올바르게 처리하고 환자 건강 상태 분류의 정확성을 향상시킨다는 것을 증명한다. In order to improve classification accuracy, the present invention proposes an advanced medical monitoring architecture based on a bi-directional long-term storage device (Bi-LSTM) for accurately analyzing medical big data. The proposed architecture integrates different sources of information for efficient medical monitoring of chronic patients. The proposed system was applied to the classification of patients using health data related to diabetes, BP, mental health and drug review. The results prove that the proposed model correctly handles disparate data and improves the accuracy of patient health status classification.

본 발명의 실시예에 따르면, 스마트폰, 웨어러블 센서, 의료 기록 및 소셜 네트워크와 같은 다양한 소스에서 가장 유용한 의료 데이터를 대량으로 추출하는 새로운 프레임워크가 구축된다. 추출된 데이터를 저장하기 위해 빅데이터 클라우드 저장소를 활용하고, 구조화 및 비정형 데이터를 지능적으로 처리하기 위해 맵 축소(MapReduce)를 적용한다. According to an embodiment of the present invention, a new framework is built to extract the most useful medical data in bulk from various sources such as smartphones, wearable sensors, medical records and social networks. It utilizes big data cloud storage to store extracted data, and applies MapReduce to intelligently process structured and unstructured data.

실제 빅데이터 분석을 위한 빅데이터 분석 엔진이 제안된다. 불일치를 포함하고 누락값, 노이즈, 다른 형식, 큰 크기 및 높은 차원성을 가진 의료 데이터를 정확하게 처리하기 위해 사용된다. 또한, 데이터 처리 품질을 향상시키고 시간을 절약하기 위해 활용한다. 빅 데이터 분석 엔진에서는 유용한 기능을 추출하고 데이터의 차원수를 줄이기 위해 인공지능 접근법을 사용한다.A big data analysis engine is proposed for actual big data analysis. It is used to accurately process medical data containing inconsistencies and with missing values, noise, different formats, large sizes and high dimensionality. It is also used to improve data processing quality and save time. Big data analysis engines use artificial intelligence approaches to extract useful functions and reduce the number of dimensions of data.

Word2vec라 불리는 신경망 기반 워드 임베딩 모델을 사용하여 의미론적 뜻를 가진 건강관리 텍스트 데이터를 표시한다. 또한, 특정 도메인 온톨로지가 워드2vec 모델과 통합되어 있다. 이러한 온톨로지는 특이한 단어의 의미론적 뜻를 이해하는 신경망 모델에 대한 추가 정보를 제공한다. A neural network-based word embedding model called Word2vec is used to display healthcare text data with semantic meaning. In addition, a specific domain ontology is integrated with the Word2vec model. This ontology provides additional information about the neural network model that understands the semantic meaning of unusual words.

비구조화 및 구조화된 의료 데이터의 분류에 Bi-LSTM 모델을 사용한 새로운 의미론적 지식을 사용한다. 제안된 모델은 의미론적 지식과 Bi-LSTM에 기반하여 다른 분류자(예를 들어, 콘볼루션 신경망 네트워크(CNN), MLP, SVM, 퍼지 분류기, 퍼지 분류기(fuzzy logic), 로지스틱 회귀(logistic regression), 랜덤 포레스트(random forest), KNNN)와 비교된다. 또한 제안된 모델과 함께 주성분 분석(Principal Component Analysis; PCA)과 정보 이득(Information Gain; IG)을 활용하고, 그 결과를 비교한다. 이 비교는 제안된 접근법과 분류 모델의 장점과 한계를 결정하는 데 도움이 된다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.New semantic knowledge using Bi-LSTM models is used for classification of unstructured and structured medical data. The proposed model is based on semantic knowledge and Bi-LSTM with different classifiers (e.g., convolutional neural network (CNN), MLP, SVM, fuzzy classifier, fuzzy logic, logistic regression). , Compared to random forest, KNNN). In addition, Principal Component Analysis (PCA) and Information Gain (IG) are used together with the proposed model, and the results are compared. This comparison helps to determine the strengths and limitations of the proposed approach and classification model. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

기계 학습 기술 및 빅데이터는 만성 환자의 의료 모니터링 시스템에서 핵심적인 역할을 한다. 의료 영역에서 웨어러블 센서와 소셜 네트워킹 데이터의 사용이 빠르게 증가함에 따라, Machine learning technology and big data play a key role in chronic patient medical monitoring systems. With the rapid increase in the use of wearable sensors and social networking data in the medical field,

웨어러블 센서를 이용한 당뇨병 및 BP 환자의 ML 접근 및 의료 모니터링에 있어서, 웨어러블 센서 기반의 생리적 정보 추출과 만성 질환 환자의 건강 모니터링에 관한 다양한 연구가 있었다. 이러한 연구는 웨어러블 센서 데이터를 분석하는 다양한 방법을 도입했다. 웨어러블 센서와 ML 기반의 맞춤형 의료 모니터링 시스템이 당뇨병 환자 모니터링을 위해 제공되었다. 블루투스 저에너지 센서를 활용해 데이터를 수집하는 시스템이다. 이후, 다층 수용체와 LSTM를 적용하여 당뇨병 유형을 분류하고 입력 사용자의 포도당 수치를 예측했다. 당뇨병 예측을 위해 ML 방법 및 Hadoop 환경에 기초한 프레임워크가 제안되었다. 이러한 연구에서 특징 추출에 대한 정보 이득 알고리즘을 제안했다. 또한, 나이브 베이즈(naive Bayes), 의사결정 트리, 랜던 포레스트(random forest)를 활용해 당뇨를 예측했다. 그러나 자료의 양이 증가하면 이들 분류기의 정확도는 크게 감소하였다. 웨어러블 센서, 빅데이터 및 ML을 기반으로 하는 5G 스마트 당뇨 시스템이 맞춤형 당뇨 진단을 위해 제공되었다. 이러한 시스템은 5가지 주요 목표(스마트화, 개인화, 편안함, 효과성, 지속가능성)를 가지고 있어 환자가 당뇨병을 조기에 발견할 수 있도록 도와주고, 개인화된 치료 솔루션을 제공한다. 사전 예방적 의료 모니터링을 위해 웨어러블 기술 및 사물인터넷(IoT) 기반 권장 시스템이 제안되었다. 이러한 시스템은 모니터링해야 하는 요인의 식별과 요인 측정에 반드시 사용되어야 하는 웨어러블 기술의 식별이라는 두 가지 주요 문제를 해결했다. 센서 데이터 및 실험실 테스트를 효과적으로 채굴하기 위해 LSTM 기반 모델이 제시되었다. 여기서, 체온, 수축기 혈압, 혈당, 심장 박동수를 포함한 중환자실(ICU) 환자의 서로 다른 속성을 활용하여 LSTM과 MLP 기술을 훈련시켰다. 그런 다음 이러한 분류자들의 효용성을 검증하고 그들의 성과를 비교했다. In ML access and medical monitoring of diabetic and BP patients using wearable sensors, there have been various studies on extraction of physiological information based on wearable sensors and health monitoring of patients with chronic diseases. These studies have introduced a variety of methods to analyze wearable sensor data. Wearable sensors and ML-based customized medical monitoring systems were provided for diabetic patient monitoring. It is a system that collects data using a Bluetooth low energy sensor. Afterwards, multilayer receptors and LSTM were applied to classify diabetes types and predict glucose levels of input users. For diabetes prediction, a framework based on ML method and Hadoop environment was proposed. In this study, an information gain algorithm for feature extraction was proposed. They also predicted diabetes using naive Bayes, decision trees, and random forests. However, as the amount of data increased, the accuracy of these classifiers decreased significantly. A 5G smart diabetes system based on wearable sensors, big data and ML was provided for customized diabetes diagnosis. These systems have five main goals (smart, personalization, comfort, effectiveness, and sustainability) to help patients detect diabetes early and provide personalized treatment solutions. A wearable technology and Internet of Things (IoT)-based recommendation system was proposed for proactive medical monitoring. These systems solve two major problems: identification of factors that should be monitored and identification of wearable technologies that must be used to measure factors. In order to effectively mine sensor data and laboratory tests, an LSTM-based model was presented. Here, LSTM and MLP techniques were trained using different attributes of intensive care unit (ICU) patients, including body temperature, systolic blood pressure, blood sugar, and heart rate. Then, the effectiveness of these classifiers was verified and their performance was compared.

빅데이터 프레임워크가 당뇨병에 대해 제안되었다. 이러한 연구는 환자의 현재 당뇨병 상태와 당뇨병 관리를 위한 해결책을 제시해 주었다. 또한 의료 분야에서 빅데이터의 당면과제에 대해서도 논의되었다. 향후 혈당 수준을 예측하는 양방향 LSTM이 제안되었다. 이러한 시스템에서 20명의 실제 환자의 26개의 데이터 세트를 사용하여 간단한 LSTM과 Bi-LSTM의 결과를 비교했다. 데이터 수집, 데이터 처리, 시스템 응답 등 다양한 주제를 다루는 프레임워크가 논의되었고, 또한 웨어러블 기기와 인적 요인 사이의 차이를 살펴보았다. A big data framework has been proposed for diabetes. These studies have provided solutions for managing diabetes and the patient's current diabetes status. In addition, the challenges of big data in the medical field were also discussed. A bidirectional LSTM that predicts blood glucose levels in the future has been proposed. In this system, the results of a simple LSTM and Bi-LSTM were compared using 26 data sets of 20 real patients. Frameworks covering various topics such as data collection, data processing, and system response were discussed, and the differences between wearable devices and human factors were also examined.

의료 지원을 위한 지식 발견 접근법이 제안되었다. 이 시스템은 클라우드 환경의 빅데이터 분석을 단순화했으며 혈압 환자의 비정상적인 상태를 예측하기 위한 분류를 결정하는 새로운 방법을 설명했다.A knowledge discovery approach has been proposed for medical support. The system has simplified the analysis of big data in the cloud environment and described a new method of determining the classification to predict an abnormal condition in blood pressure patients.

소셜 네트워킹 데이터와 ML 접근방식을 기반으로 한 의료 모니터링 시스템에 있어서, 광범위하게 논의되는 의료 모니터링 시스템은 소셜 네트워킹 데이터와 약물 정보를 활용하고 있다. 환자의 심리적 장애를 감지하기 위한 정서적 건강관리 시스템이 제시되었다. 소셜 네트워킹 플랫폼에 게재된 환자 메시지를 활용했고 우울증과 스트레스 수치를 확인했다. 이들은 CNN, 반복 신경망(Recurrent Neural Networks; RNN), Bi-LSTM을 적용해 스트레스와 우울증을 감지했다. 또 모니터링 결과를 바탕으로 환자에게 문자를 보내는 온톨로지 기반의 추천 시스템을 제안했다. 선천성 우울증에 대한 심층학습(Deep learning) 기반 정서 분석이 제시되었다. LSTM 훈련을 위해 위챗 프렌드-서클(WeChat friends-circle) 데이터를 활용했으며, 환자의 선천적 우울증을 관찰하기 위해 이모티콘을 특징 추출로 사용했다. 이 시스템은 검진 시간을 단축하고 의사와 환자의 의사소통 비용을 줄였다. 의료용 소셜 네트워킹 플랫폼에 대한 환자 게시물을 분석하고 환자의 심각한 문제를 식별하기 위한 인공지능 접근방식이 제안되었다. 이 시스템은 텍스트와 ML 분류기의 사전 처리 방법을 적용하여 환자 게시물을 자동으로 분석하고 필요할 때 의사에게 알려주었다. 트위터의 소셜 네트워킹 활동을 기반으로 당뇨병 위험을 식별하는 새로운 시스템이 제안되었다. 여기서, 데이터 생성에 대한 새로운 접근법이 제시되었고, 사용자들의 당뇨병 위험을 그들의 트위터 활동에 근거하여 분류하는 ML 방법을 제안했다. Bi-LSTM 및 게이트된 반복 유닛을 사용하여 치료 중인 환자의 소셜 미디어 의견을 지능적으로 처리했다. 약물에 대한 환자의 말에 초점을 맞추고, 그것들로부터 질병과 관련된 의학 개념을 발견하였다. 이 시스템은 질병을 발견하기 위해 다른 유형의 텍스트를 고려했다(예를 들어, "너무 일찍 일어나기"와 같은 텍스트를 사용하는 환자에게서 우울증 장애가 발견되었다). In medical monitoring systems based on social networking data and ML approaches, the widely discussed medical monitoring systems utilize social networking data and drug information. An emotional health management system has been proposed to detect a patient's psychological disorder. They used patient messages posted on social networking platforms and checked their depression and stress levels. They applied CNN, Recurrent Neural Networks (RNN), and Bi-LSTM to detect stress and depression. In addition, an ontology-based recommendation system for sending text messages to patients based on the monitoring results was proposed. A deep learning-based emotional analysis for congenital depression was presented. WeChat friends-circle data was used for LSTM training, and emoticons were used as feature extraction to observe the patient's congenital depression. This system shortened examination time and reduced the cost of communication between doctors and patients. An artificial intelligence approach has been proposed to analyze patient posts on medical social networking platforms and to identify serious patient problems. The system applied pre-processing of text and ML classifiers to automatically analyze patient posts and inform doctors when needed. A new system has been proposed to identify diabetes risk based on Twitter's social networking activities. Here, a new approach to data generation was presented, and an ML method was proposed to classify users' diabetes risk based on their Twitter activity. Bi-LSTM and gated repeat units were used to intelligently process social media comments from patients on treatment. Focusing on the patient's words about drugs, he discovered medical concepts related to diseases from them. The system considered different types of text to detect the disease (for example, a depressive disorder was found in patients using text such as "Wake up too early").

의약품 부작용(ADEs)을 방지하기 위한 새로운 시스템이 제시되었다. 이 시스템에서 환자의 안전을 향상시키고 의료 과정을 최적화하기 위해 ADE 처방에 초점을 맞추었다. 약물 검토의 특징 기반 정서 분석이 제시되어 의약품 부작용의 감지를 위한 것이었다. 이 시스템은 약물의 효과와 부작용을 식별하는 약물 검토에 대해 서로 다른 작업을 수행한다. 소셜 네트워크의 의약품 부작용 게시물을 탐지하기 위한 감성 기능 기반 시스템이 제안되었다. 약물 부작용과 관련된 포스트를 식별하기 위한 실질적인 접근법을 개발하기 위해 많은 양의 텍스트와 실험을 수행했다. A new system has been proposed to prevent adverse drug reactions (ADEs). In this system, the focus is on ADE prescriptions to improve patient safety and optimize medical procedures. A feature-based sentiment analysis of drug review was presented to detect drug side effects. The system performs different tasks for drug reviews that identify drug effects and side effects. A system based on emotional function has been proposed to detect the posts of adverse drug reactions on social networks. A large amount of text and experimentation has been conducted to develop a practical approach to identifying posts related to drug side effects.

당뇨병 치료 분야에서 온톨로지 기반의 특징 수준의 정서 분석 시스템이 제안되었다. 분류 작업을 성공적으로 수행한 특징들 사이의 의미적 관계를 제공하기 위해 온톨로지를 이용했다. 온톨로지 특징과 텍스트에서 요구되는 특징 사이의 관계를 발견하기 위해 온톨로지 매핑 프레임워크가 제시되었다. 이 시스템에서 바이오 포털의 다양한 온톨로지를 사용했으며 워드 임베딩에 대하여 온톨로지로부터 특징을 추출하는 새로운 접근법을 제안했다. 의학 지식 임베딩을 위한 심층학습이 제안되었다. 이 작업에서는 간질과 관련된 생물의학 온톨로지를 채용하여 임상문서에서 의학개념을 자동으로 찾아냈다. 트위터 텍스트에서 질병 이름 추출에 대한 심층 학습 성과를 높이기 위해 온톨로지가 제안되었다. 트윗에서 질병 이름을 자동으로 추출하기 위한 온톨로지를 이용하여 신경망의 구조를 제시했다. BO-LSTM라 불리는 시스템이 텍스트에서 생물의학 정보를 검출하기 위해 제시되었다. 이 접근방식은 의미 실체가 누락된 문제를 워드 임베딩에서 극복한다. 여기서 심층 학습 LSTM을 이용한 생물의학 관계의 추출을 강화하기 위해 워드 임베딩이 포함된 도메인 온톨로지를 활용했다.In the field of diabetes treatment, an ontology-based emotional analysis system with a characteristic level has been proposed. Ontology was used to provide a semantic relationship between the features that were successfully classified. An ontology mapping framework was proposed to discover the relationship between the ontology features and the features required in the text. In this system, various ontology of bio portals were used, and a new approach was proposed to extract features from the ontology for word embedding. Deep learning for embedding medical knowledge has been proposed. In this work, a biomedical ontology related to epilepsy was employed to automatically find medical concepts in clinical documents. Ontology was proposed to improve the outcome of in-depth learning about disease name extraction from Twitter text. The structure of a neural network was presented using an ontology to automatically extract disease names from tweets. A system called BO-LSTM has been proposed to detect biomedical information in text. This approach overcomes the problem of missing semantic entities in word embedding. Here, to reinforce the extraction of biomedical relationships using deep learning LSTM, a domain ontology with word embedding was used.

의료 빅데이터 및 심층 학습 접근 방식에 있어서, 낮은 차원을 가진 의료 빅데이터의 분석과 표현은 의료 모니터링 시스템의 또 다른 주요 이슈다. 최근에는 빅데이터의 분석, 빅데이터의 차원 감소, ML을 이용하여 대량의 다른 종류로 이루어진 데이터를 관리하는 다양한 시스템이 제시되고 있다. 종래기술에서 원격 의료 애플리케이션을 사용하여 환자 데이터를 수집한 다음 퍼지 규칙 분류기를 데이터에 적용했다. 이들은 정보 클러스터링, 정보 검색, 클라우드 환경에서의 빅데이터 병렬 처리 등에 관한 다양한 이슈를 논의했다. 또 다른 종래기술에서는 빅데이터의 진실성을 담보하기 위한 추천 시스템, 수치 평판 시스템, 컨텍스트 조사 등이 제안되었다. 또한, ML 방법 및 분석 알고리즘을 사용하여 데이터 볼륨을 처리하고 데이터 속도를 관리하였다. 또 다른 종래기술에서는 학생의 질병을 예측하기 위해 클라우드 중심의 IoT를 기반으로 한 질병 진단 의료 프레임워크가 제안되었다. 이 시스템에서는 의료 센서와 캘리포니아 대학교 어바인(UCI) 데이터 세트를 사용하여 학생 건강관리 데이터가 생성되었으며, ML 알고리즘은 학생들의 질병을 예측하는 훈련을 받았다. 또 다른 종래기술에서는 클라우드와 IoT를 기반으로 한 모바일 헬스케어 애플리케이션이 제안되어 심각한 질병의 모니터링 및 치료에 임하고 있다. UCI 리포지토리 데이터 세트를 사용하여 심각한 당뇨병의 진단을 위해 퍼지 신경 분류기를 적용했다. 또 다른 종래기술에서는 효과적인 건강 추천 시스템을 위한 빅데이터 분석이 제안되었다. 이 시스템은 환자의 소셜 네트워킹 활동에서 추출한 대량의 정형 및 비정형 데이터를 처리한다. 또한 ML 알고리즘을 적용하여 환자를 위한 중심 치료를 권장하였다. In the medical big data and deep learning approach, the analysis and expression of medical big data with low dimensions is another major issue of medical monitoring systems. Recently, various systems have been proposed for analyzing big data, reducing the dimension of big data, and managing large amounts of different types of data using ML. In the prior art, a telemedicine application was used to collect patient data and then a fuzzy rule classifier was applied to the data. They discussed various issues related to information clustering, information retrieval, and big data parallel processing in a cloud environment. In another prior art, a recommendation system, a numerical reputation system, and a context survey have been proposed to ensure the integrity of big data. In addition, ML methods and analysis algorithms were used to process data volumes and manage data rates. In another prior art, a disease diagnosis medical framework based on cloud-centric IoT has been proposed to predict student diseases. In this system, student healthcare data was generated using medical sensors and the University of California Irvine (UCI) data set, and the ML algorithm was trained to predict student disease. In another prior art, a mobile healthcare application based on cloud and IoT has been proposed to monitor and treat serious diseases. A fuzzy neural classifier was applied for the diagnosis of severe diabetes using the UCI repository data set. In another prior art, big data analysis for an effective health recommendation system has been proposed. The system processes large amounts of structured and unstructured data extracted from patients' social networking activities. In addition, the ML algorithm was applied to recommend central treatment for patients.

또 다른 종래기술에서는 빅데이터 심리지수 분석 결과에 대한 IoT의 영향에 대해 논하였다. 처음에는 빅데이터 정서 분석의 개념, 특징, 결정의 가치 등을 소개한 뒤 빅데이터를 처리하는 데 필요한 프레임워크를 기술했다. 또 다른 종래기술에서는 분석 기반의 IoT 헬스케어 시스템인 빅데이터를 제시하기 위해 랜덤 포레스트와 맵 축소를 이용하였다. 이 시스템은 분석을 위해 서로 다른 질병을 가진 환자를 고려했다. 또한 향상된 잠자리(dragonfly) 알고리즘을 적용하여 최적의 특성을 선택하여 분류하였다. 또 다른 종래기술에서는 새로운 IoT 기반 프레임워크가 제안되어 의료 애플리케이션에 대한 센서 데이터를 수집하였다. 이 시스템은 센서 데이터의 수집 및 저장에 Apache HBase와 Apache Pig을 활용하고, 맵축소의 예측 모델을 적용하여 심장병을 예측한다. 또 다른 종래기술에서는 Hadoop 환경을 활용하여 트위터 데이터를 병렬로 분석하고, 트위터 데이터를 양, 중립, 음의 세 가지 등급으로 분류하였다. 시스템은 트윗에서 단어들 사이의 의미론을 효율적으로 식별하는 의미적 유사성 공식을 제공한다. 또 다른 종래기술에서는 Hadoop 프레임워크 기반 빅데이터 처리와 의료 시스템을 위한 최대 엔트로피(MaxEnt) 분류기를 제안했다. 이들은 환자의 트위터 데이터를 활용했고, 환자의 건강 상태를 예측하기 위해 정서 분석 접근법을 적용했다.In another prior art, the effect of IoT on the analysis result of the big data psychological index was discussed. Initially, the concept, characteristics, and decision value of big data sentiment analysis were introduced, and then the framework required to process big data was described. In another prior art, a random forest and map reduction were used to present big data, an analysis-based IoT healthcare system. The system considered patients with different diseases for analysis. In addition, an improved dragonfly algorithm was applied to select and classify the optimal characteristics. In another prior art, a new IoT-based framework has been proposed to collect sensor data for medical applications. This system uses Apache HBase and Apache Pig to collect and store sensor data, and predicts heart disease by applying a predictive model of map reduction. In another prior art, Twitter data was analyzed in parallel using the Hadoop environment, and Twitter data was classified into three classes: positive, neutral, and negative. The system provides a semantic similarity formula that effectively identifies the semantics between words in a tweet. Another prior art proposed a maximum entropy (MaxEnt) classifier for big data processing and medical systems based on the Hadoop framework. They used patient Twitter data and applied an emotional analysis approach to predict the patient's health status.

전술한 대부분의 시스템은 전통적인 방법에 기반을 두고 있었으며, 빅데이터의 문제를 어느 정도 다루고 있다. 그러나 빅 데이터 분석을 사용하여 높은 정확도로 빅 데이터를 처리하고 분석할 수 있는 명확한 프레임워크는 없다. 또한 이러한 시스템은 여전히 여러 유형의 정형 데이터와 비정형 데이터를 처리하기에 충분하지 않다. ML 분류기에 대한 의학적 의미지식의 한계 때문에, 이러한 시스템은 소셜 네트워킹 데이터를 잘못 분류할 수 있다. Most of the systems described above were based on traditional methods and dealt with the problem of big data to some extent. However, there is no clear framework for processing and analyzing big data with high accuracy using big data analytics. Also, these systems are still not enough to handle many types of structured and unstructured data. Because of the limitations of medical semantic knowledge for ML classifiers, these systems can misclassify social networking data.

도 1은 본 발명의 일 실시예에 따른 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 이용한 의료 모니터링 장치를 나타내는 도면이다. 1 is a diagram illustrating a medical monitoring apparatus using machine learning and semantic knowledge-based big data analysis according to an embodiment of the present invention.

제안하는 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 이용한 의료 모니터링 장치는 센서(111), 병원의료 기록(112), 병원 SNS(113) 및 환자 SNS(114)를 포함하는 데이터 소스(110), 데이터 수집부(120), 데이터 저장부(130), 빅데이터 분석 엔진(140), 데이터 표시부(150)를 포함한다. The proposed medical monitoring device using machine learning and semantic knowledge-based big data analysis includes a data source 110 including a sensor 111, a hospital medical record 112, a hospital SNS 113 and a patient SNS 114, A data collection unit 120, a data storage unit 130, a big data analysis engine 140, and a data display unit 150 are included.

데이터 소스(110)는 다른 종류로 이루어진 데이터를 처리한다. 센서 장치, 의료 기록 및 소셜 네트워킹 플랫폼이 데이터의 주요 출처다. The data source 110 processes data of different types. Sensor devices, medical records and social networking platforms are the main sources of data.

데이터 수집부(120)는 당뇨병과 BP 환자에 대한 서로 다른 영역에서 데이터를 수집하는 역할을 한다. 데이터 수집부(120)는 웨어러블 기기, 스마트폰의 약물 및 증상 정보, 그리고 애플리케이션 프로그래밍 인터페이스(APIs)를 통한 환자와 의사의 소셜 네트워크 논의에서 생리적 정보를 수집한다. The data collection unit 120 serves to collect data in different areas for patients with diabetes and BP. The data collection unit 120 collects physiological information from a social network discussion between a patient and a doctor through wearable devices, drug and symptom information of a smartphone, and application programming interfaces (APIs).

데이터 저장부(130)는 웨어러블 기기를 사용하는 환자의 모니터링 데이터와 소셜 네트워크에서 수집된 데이터를 무선 통신 네트워크를 통해 클라우드 서버로 오프로드한다. The data storage unit 130 offloads monitoring data of a patient using a wearable device and data collected from a social network to a cloud server through a wireless communication network.

빅데이터 분석 엔진(140)은 제안된 프레임워크의 가장 중요한 구성이다. 빅데이터 분석 엔진(140)은 데이터 계산부(141)와 데이터 분류부(142)의 두 하위 계층으로 나뉜다. 데이터 계산부(141)에는 데이터 사전 처리(141a) 데이터 사전 분석(141b), 특징 추출(141c), 차원 축소(141d) 및 워드 임베딩(141e)과 같은 하위 모듈이 있다. 여기에서, 온톨로지 기반의 의미론적 지식(143)은 소프트 컴퓨팅 접근법과 함께, 필요한 정보의 추출에 필요한 데이터를 처리하고 분석하는 데 이용된다. 데이터 분류부(142)에서는 온톨로지가 있는 Bi-LSTM(142a)을 당뇨병, BP, 정신건강, 약물 부작용의 분류에 활용한다. 데이터 분류부(142)는 당뇨병과 BP에 관한 다차원 빅데이터를 지능적으로 분석하여 데이터로부터 의사 결정에 대한 통찰력을 얻고, 환자에게 맞춤형 당뇨병과 BP 의료 시스템을 제공한다. The big data analysis engine 140 is the most important component of the proposed framework. The big data analysis engine 140 is divided into two lower layers: a data calculation unit 141 and a data classification unit 142. The data calculation unit 141 includes sub-modules such as data pre-processing 141a, data pre-analysis 141b, feature extraction 141c, dimension reduction 141d, and word embedding 141e. Here, the ontology-based semantic knowledge 143 is used to process and analyze data required for extraction of necessary information, along with a soft computing approach. The data classification unit 142 utilizes the Bi-LSTM 142a with the ontology for classification of diabetes, BP, mental health, and drug side effects. The data classification unit 142 intelligently analyzes multidimensional big data related to diabetes and BP to obtain insight into decision making from the data, and provides a customized diabetes and BP medical system to patients.

제안된 시스템은 Hadoop MapReduce를 ML과 함께 사용하여 환자 치료에 대한 대규모 데이터를 줄인다. 최종 계층은 분석 결과를 의사에게 제공하는 데이터 표시부(150)이다. 데이터 표시부(150)는 헬스케어 모니터링(151) 및 추천(152)을 위해 생성된 결과를 헬스케어 모니터링(151) 및 추천(152)을 통해 의사의 제안된 치료법과 결합한 다음, 개인 당뇨병과 BP 건강관리 치료를 환자에게 권장한다. 제안된 시스템은 당뇨병과 BP 환자들의 건강 위험이 높은 수준에 도달하기 전에 경고하는데 도움을 준다. 그것은 의사들이 그들의 환자에게 건강 상태를 현명하게 관찰함으로써 실제적인 치료법을 제공하는 것을 지원한다. The proposed system uses Hadoop MapReduce with ML to reduce large-scale data on patient care. The final layer is the data display unit 150 that provides the analysis result to the doctor. The data display unit 150 combines the results generated for healthcare monitoring 151 and recommendation 152 with the doctor's proposed treatment through healthcare monitoring 151 and recommendation 152, and then personal diabetes and BP health. Management therapy is recommended to the patient. The proposed system helps to warn patients with diabetes and BP before their health risks reach high levels. It helps doctors provide practical treatments for their patients by observing their health status wisely.

더욱 상세하게는, 본 발명의 실시예에 따른 데이터 수집부(120)는 네 가지 다른 출처로부터 데이터를 수집한다. 먼저, 혈압 감시기, 글루코미터 센서, 스마트워치, 맥박산소미터, 가속도계, 온도 센서, 체중계 등과 같은 스마트 센서와 착용 가능한 장치를 사용하여 혈압, 혈당 수치, 심박수, 스트레스 비율, 산소 포화도, 온도 및 중량과 같은 환자의 생리적 징후를 모니터링하기 위하여 환자의 실시간 신체 신호를 수집할 수 있다. 제안하는 시스템에서는 웨어러블 기기를 환자의 몸에 배치하여 신체 기능을 감시한다. 추가적으로, 모바일 기기의 이용이 크게 증가해, 최근에는 건강 관련 어플리케이션이 잘 개발되고 있다. 모바일 기기에는 인체의 다른 부분에 대한 정확한 정보를 수집할 수 있는 기회를 제공하는 센서가 포함되어 있다. 따라서 스마트폰은 당뇨병 환자나 BP 환자로부터 식이요법, 운동, 기타 활동 정보를 수집하고, 개인 정보(나이, 성별, 키, 기타 정보)를 수집하는 데 이용된다. 이 정보는 의료 및 질병 예방에 중요하다. 하지만 스마트폰 데이터는 불안정성이 매우 높고 쉽게 손상될 수 있다. 따라서 정확한 건강관리 및 관련 정보추출을 위해 스마트폰 데이터를 활용하기는 어렵다. 효율적인 의료 서비스를 위해 센서 데이터를 관리하고 관련 정보를 추출하기 위해 데이터 마이닝 기술과 온톨로지 기반 의미 지식을 활용했다.More specifically, the data collection unit 120 according to an embodiment of the present invention collects data from four different sources. First, using smart sensors and wearable devices such as blood pressure monitors, glucometer sensors, smart watches, pulse oximeters, accelerometers, temperature sensors, weight scales, etc. to determine blood pressure, blood sugar levels, heart rate, stress ratio, oxygen saturation, temperature and weight. In order to monitor the physiological signs of the same patient, real-time body signals of the patient can be collected. In the proposed system, a wearable device is placed on the patient's body to monitor body functions. In addition, the use of mobile devices is greatly increased, and in recent years, health-related applications are well developed. Mobile devices contain sensors that provide an opportunity to collect accurate information about different parts of the body. Therefore, smartphones are used to collect diet, exercise, and other activity information from diabetics or BP patients, and to collect personal information (age, gender, height, and other information). This information is important for health care and disease prevention. However, smartphone data is highly insecure and can be easily damaged. Therefore, it is difficult to use smartphone data for accurate health management and related information extraction. In order to manage sensor data and extract related information for efficient medical service, data mining technology and ontology-based semantic knowledge were used.

의료기록은 당뇨병과 BP환자가 겪은 치료에 대한 자료이다. 환자 기록(예를 들어, 치료, 실험실 테스트 및 약물 섭취)을 포함한 환자의 의료 기록을 수집한다. 이러한 기록은 당뇨병 및 BP 진단에 대한 의료 가이드라인을 개선하는 데 도움이 될 수 있는 귀중한 정보를 추출하기 위해 분석될 수 있다. 그러나 의료기록의 부피는 대개 크고, 각각의 기록은 분산된 변수와 높은 차원성을 가진 데이터로 구성된다. 또한 당뇨병과 BP 환자는 신장 및 심혈관 질환, 신경증, 피부 및 눈 질환과 같은 다른 합병증에 직면할 수 있다. 따라서 상기 합병증의 영향을 받는 환자를 식별하고, 보다 구체적인 검사로 상태를 모니터링하기 위해 의료기록을 분석해야 한다.Medical records are data on the treatment experienced by patients with diabetes and BP. Patient medical records are collected, including patient records (eg, treatment, laboratory tests, and drug intake). These records can be analyzed to extract valuable information that can help improve medical guidelines for diabetic and BP diagnosis. However, the volume of medical records is usually large, and each record consists of data with scattered variables and high dimensionality. In addition, patients with diabetes and BP may face other complications such as kidney and cardiovascular disease, neurosis, and skin and eye diseases. Therefore, it is necessary to analyze medical records in order to identify patients affected by the complications and monitor the condition with more specific tests.

제안된 시스템은 우선 병원 소셜 네트워킹 플랫폼에서 환자 컨텐츠를 추출한다. 그러나, 이 작업은 추가적인 작업이 필요하며, 완전히 소셜 네트워크 프라이버시 설정에 의존한다. 일부 특정 소셜 네트워크의 API는 공개되지 않는다. 이 상황에서 wrappers와 같은 특수 소프트웨어를 사용하여 정보를 추출할 수 있다(예를 들어, 환자 게시물). 일반적으로 당뇨병과 BP 환자는 정기적으로 의사에게 연락하지만, 자신의 건강 상태에 대한 개인적 모니터링을 위한 지원, 지식 및 기술도 필요하다. 또한 환자가 의사로부터 효율적인 정보를 얻지 못하면 소셜 미디어는 자신의 요구를 충족시키는 데 중요한 역할을 할 수 있다. 따라서, 페이스북이나 트위터와 같은 소셜 네트워킹 플랫폼은 환자들이 당뇨병과 BP에 관한 충분한 지식을 얻고, 건강상의 문제나 경험이 비슷한 사람들과 연결할 수 있는 기회를 제공한다. 소셜 네트워크는 환자와 의사 모두에게 당뇨병 치료에 대한 지식을 공유할 수 있는 플랫폼을 제공한다. 약품 리뷰나 환자의 감정적 게시물 같은 소셜 미디어 데이터를 수집하여 스트레스와 우울증 정도를 예측하고, 식생활과 생활양식의 맥락에서 당뇨병 약물의 부작용을 파악하고, 환자의 치료와 지식을 향상시킨다. The proposed system first extracts patient content from the hospital social networking platform. However, this task requires additional work and completely relies on social network privacy settings. APIs of some specific social networks are not disclosed. In this situation, special software such as wrappers can be used to extract the information (eg patient posts). In general, people with diabetes and BP contact their doctor on a regular basis, but they also need support, knowledge and skills for personal monitoring of their health. Also, if patients don't get efficient information from their doctors, social media can play an important role in meeting their needs. Thus, social networking platforms such as Facebook and Twitter provide opportunities for patients to gain sufficient knowledge about diabetes and BP and connect with people with similar health problems or experiences. Social networks provide a platform for sharing knowledge about diabetes treatment to both patients and doctors. It collects social media data such as drug reviews and emotional posts from patients to predict the level of stress and depression, identifies side effects of diabetic drugs in the context of diet and lifestyle, and improves patient treatment and knowledge.

소셜 네트워크의 인기는 건강관리 분야에서 증가했다. 따라서, 환자들은 그들의 요구를 충족시키기 위해 온라인 커뮤니티와 연결된다. 미국 당뇨병 협회, 당뇨병 조리법과 음식 힌트, 당뇨 일간지, 당뇨병 건강 등 페이스북의 다양한 커뮤니티는 환자들이 서로 정보를 공유할 수 있는 플랫폼을 제공한다. 환자들은 게시물에 반응하고 다른 사람들과 공유할 수 있다. Java 클라이언트인 RestFB와 함께 Graph API를 사용하여 페이스북 페이지로부터 데이터를 추출했다. Graph API는 페이스북에서 정보를 자동으로 추출할 수 있게 해준다. 우선 당뇨병 환자와 약물에 대한 정보가 들어 있는 지역사회 페이지를 선택한다. 그리고 2017년 1월부터 2019년 1월 사이에 발간된 커뮤니티 페이지에서 모든 게시물을 추출했다. 이러한 게시물에서 이루어진 반응(또는 반응과 감정)은 추가 처리를 위해 수집되고 저장되었다.The popularity of social networks has increased in the healthcare field. Thus, patients connect with online communities to meet their needs. Facebook's diverse community, including the American Diabetes Association, Diabetes Recipes and Food Hints, Diabetes Daily Newspaper, and Diabetes Health, provides a platform for patients to share information with each other. Patients can respond to posts and share with others. Data was extracted from a Facebook page using the Graph API with a Java client, RestFB. The Graph API allows you to automatically extract information from Facebook. First, select a community page that contains information about diabetics and medications. And all posts were extracted from community pages published between January 2017 and January 2019. The reactions (or reactions and emotions) made in these posts were collected and stored for further processing.

Twitter의 Streaming API와 REST API를 사용하여 당뇨병 데이터를 담고 있는 트윗을 검색했다. 다양한 쿼리가 있는 REST API를 사용하여 최신 트윗을 검색했다. 이러한 질문에서 당뇨병과 관련된 가장 구체적인 용어는 가장 관련성이 높은 트윗을 회수하는 데 사용되어야 한다. 따라서 당뇨병 환자 AND (혈압 OR 혈당), 고혈압 환자 AND (심박수 OR 스트레스 비율 OR 혈당) 및 당뇨병 치료제 환자 AND(당뇨 OR 약물)와 같이 불 연산자 AND 및 OR과 함께 키워드를 기반으로 한 질의를 작성하였다. 당뇨병과 관련된 질의를 구성하기 위해 30개 이상의 키워드를 사용했다. 앞서 언급한 키워드 기반의 질의를 통해 당뇨병 환자, 치료제, 약물, 증상에 대한 30만 트윗을 확보했다.Twitter's Streaming API and REST API were used to retrieve tweets containing diabetes data. I used a REST API with various queries to retrieve the latest tweets. In these questions, the most specific terms related to diabetes should be used to retrieve the most relevant tweets. Therefore, queries based on keywords were written along with Boolean operators AND and OR, such as diabetes patient AND (blood pressure OR blood sugar), hypertension patient AND (heart rate OR stress ratio OR blood sugar), and diabetes treatment patient AND (diabetes OR drug). More than 30 keywords were used to construct a diabetes-related query. Through the aforementioned keyword-based query, 300,000 tweets were obtained about diabetics, treatments, drugs, and symptoms.

당뇨병 환자들은 특정 약물을 복용함으로써 생기는 부작용, 질병, 증상에 대해 아는 것에 관심이 있다. 보통 환자들은 같은 소셜 미디어 웹사이트에서 약물 부작용에 대한 그들의 경험을 공유한다. 이러한 웹사이트는 항당뇨제 부작용과 관련된 엄청난 양의 정보를 담고 있다. 따라서, 이러한 웹사이트들은 제안된 시스템의 데이터 소스로 선택되었다. 여기서, 당뇨병, BP용 섭취약 이름을 조회로 사용했고, 1600개의 게시물을 검색했으며, 키워드 기반 검색엔진을 사용해 약물 관련 정보만 담긴 게시물을 추출했다. 또한, 제안된 시스템은 약물에 대한 25,000명의 사용자 의견(검토)을 수집하는 자동 웹 크롤러를 채택했다.Diabetes patients are interested in knowing the side effects, diseases, and symptoms of taking certain medications. Usually patients share their experiences of drug side effects on the same social media website. These websites contain a huge amount of information related to the side effects of antidiabetic drugs. Therefore, these websites were chosen as the data sources for the proposed system. Here, the name of the drug for diabetes and BP was used as a lookup, and 1600 posts were searched, and the posts containing only drug-related information were extracted using a keyword-based search engine. In addition, the proposed system employs an automatic web crawler that collects 25,000 user comments (reviews) on drugs.

도 2는 본 발명의 일 실시예에 따른 빅데이터 클라우드 서버 및 HDFS를 설명하기 위한 도면이다. 2 is a view for explaining a big data cloud server and HDFS according to an embodiment of the present invention.

빅데이터 클라우드 서버의 데이터 저장부(130)는 환자의 상태를 모니터링하고 유용한 정보를 제공하기 위해 네 가지 다른 데이터 출처를 고려한다. 여기서 웨어러블 기기 데이터는 개별 환자에 대한 생리적 정보를 추출하는 데 사용된다. 둘째, 환자의 의료기록을 수집하여 치료와 의료 이력을 결정한다. 셋째, 환자의 느낌, 감정, 스트레스를 파악하기 위해 SNS에서 환자의 내용을 추출한다. 넷째, 현재 의약품 섭취의 부작용을 파악하기 위해 의료 웹페이지에서 약물에 대한 환자 리뷰를 수집한다. 네 개의 서로 다른 소스에서 나온 이러한 대량의 데이터는 저장하고 처리하기 어렵다. 또 각종 질병에 걸린 환자가 급격히 늘고 있으며, 이로 인해 의료업계에서 많은 데이터가 생성되고 있다. 따라서 데이터를 능숙하게 처리하고 처리하는 현명한 접근이 필요하다. 따라서 추출된 데이터를 언제 어디서든 쉽게 액세스할 수 있도록 빅데이터 클라우드 저장소를 활용한다. 정보를 정확하고 신속하게 검색할 수 있도록 데이터를 지능적으로 저장해야 한다. 제안된 작업에 사용된 데이터는 다용성과 속도를 갖춘 다량으로 수록되어 있다. 따라서 기존의 저장 및 검색 방법은 서로 다른 소스의 데이터를 처리하지 않을 수 있다.The data storage unit 130 of the big data cloud server considers four different data sources to monitor the patient's condition and provide useful information. Here, the wearable device data is used to extract physiological information about individual patients. Second, the patient's medical records are collected to determine treatment and medical history. Third, the patient's content is extracted from SNS to understand the patient's feelings, emotions, and stress. Fourth, patient reviews of drugs are collected on medical web pages to identify side effects of current drug consumption. These large amounts of data from four different sources are difficult to store and process. In addition, the number of patients suffering from various diseases is increasing rapidly, and as a result, a lot of data is being generated in the medical industry. Therefore, we need a smart approach to manipulating and processing data skillfully. Therefore, it utilizes big data cloud storage so that the extracted data can be easily accessed anytime, anywhere. Data must be stored intelligently so that information can be retrieved accurately and quickly. The data used in the proposed work is contained in large volumes with versatility and speed. Therefore, the existing storage and retrieval methods may not process data from different sources.

제안된 시스템은 Amazon S3라는 개인 클라우드 서버와 연결되며, 환자 정보를 저장하기 위해 확장성이 뛰어나고 보안이 유지된다. 아마존 S3는 데이터를 버킷에 저장하고, 다른 목적으로 처리한다. 버킷마다 고유 이름과 URL을 할당하고, s3cmd 방식을 활용해 데이터를 Amazon S3에 업로드한다. s3cmd 방법은 사용자가 클라우드 데이터베이스의 데이터를 업로드, 검색 및 관리할 수 있도록 한다. The proposed system connects to a private cloud server called Amazon S3, and is highly scalable and secure for storing patient information. Amazon S3 stores data in buckets and processes them for other purposes. Each bucket is assigned a unique name and URL, and data is uploaded to Amazon S3 using the s3cmd method. The s3cmd method allows users to upload, search, and manage data in a cloud database.

Amazon S3는 데이터를 HBase(231) 클러스터에 업로드할 수 있는 기능을 제공한다. 도 2와 같이 웨어러블 센서와 소셜 네트워킹 데이터가 클라우드(210) 서버에서 HDFS(Hadoop Distributed File System)(232)으로 전송되고 있다. HDFS(232)는 데이터를 HBase(231) 클러스터로 전송하기 위해 높은 대역폭을 제공하는 분산 파일 스토리지 시스템이다. HBase(231)는 HDFS(232) 위에서 실행된다. 행과 열의 형태로 데이터를 저장하는 분산 데이터베이스 관리이다. Flume(220)은 데이터를 클라우드(210) 서버에서 Hadoop 에코시스템으로 전송하는 데 사용된다. Flume(220)에는 3개의 단위(소스, 데코레이터, 싱크)가 들어 있으며, 데이터의 출처, 데이터의 데코레이션(예를 들어, 압축 및 압축해제) 및 특정 목적을 위한 데이터의 타겟이 각각 제시된다. 웨어러블 기기에 의해 추출된 데이터를 맵 축소(MapReduce)(242)로 전송하기 위해 Apache Pig(241)를 사용한다. Apache Pig(241)는 정형 및 비정형 데이터를 처리하고 데이터 분석을 위해 이를 나타낸다. 맵 축소(MapReduce)(242)는 분산 환경에서 작동하는 대규모 데이터셋을 위한 Hadoop의 병렬 처리 시스템이다. 그것은 두 가지 주요 기능을 가지고 있다. 즉 맵(Map)과 축소(Reduce)이다. 맵 태스크는 쌍으로 된 키 값의 빅데이터에서 값을 수집한다. 축소 기능은 특정 키에 대해 설정된 값을 저장한다. 맵 축소는 정형 데이터와 비정형 데이터를 지능적으로 처리한다. 이후, 저장된 데이터는 분석 엔진(Analytics Engine)(250)으로 전달된다. Amazon S3 provides the ability to upload data to the HBase (231) cluster. As shown in FIG. 2, the wearable sensor and social networking data are transmitted from the cloud 210 server to the Hadoop Distributed File System (HDFS) 232. HDFS (232) is a distributed file storage system that provides a high bandwidth to transmit data to the HBase (231) cluster. HBase 231 runs on top of HDFS 232. It is a distributed database management that stores data in the form of rows and columns. Flume 220 is used to transfer data from the cloud 210 server to the Hadoop ecosystem. Flume 220 includes three units (source, decorator, sink), and the source of data, decoration of data (eg, compression and decompression), and target of data for a specific purpose are presented, respectively. Apache Pig 241 is used to transmit data extracted by the wearable device to MapReduce 242. Apache Pig 241 processes structured and unstructured data and presents it for data analysis. MapReduce 242 is Hadoop's parallel processing system for large datasets operating in a distributed environment. It has two main functions. That is, Map and Reduce. The map task collects values from big data of paired key values. The reduction function saves the value set for a specific key. Map reduction intelligently processes structured and unstructured data. Thereafter, the stored data is transmitted to the analytics engine 250.

도 3은 본 발명의 일 실시예에 따른 빅데이터 분석 엔진을 설명하기 위한 도면이다. 3 is a diagram illustrating a big data analysis engine according to an embodiment of the present invention.

데이터는 감지 데이터, 의료 기록 및 소셜 네트워킹 데이터로 구성된다. 그러나 비일관성, 누락값, 노이즈, 다른 형식, 큰 크기, 높은 차원성 때문에 실제 빅데이터를 처리하기는 극히 어렵다. 저품질 및 노이즈 데이터가 저품질의 결과를 낳는다. 실제 처리 전에 데이터 처리 단계를 적용하여 데이터 처리 품질을 개선하고 시간을 절약한다. 제안하는 시스템은 도 3과 같이 센서 데이터의 사전 분석, 센서 데이터의 사전 처리 및 필터링, 의료 기록의 사전 처리 및 소셜 네트워크 콘텐츠의 사전 처리를 포함한다.The data consists of sensory data, medical records and social networking data. However, it is extremely difficult to process real big data because of inconsistencies, missing values, noise, different formats, large size, and high dimensionality. Low quality and noisy data leads to poor quality results. Apply data processing steps before actual processing to improve data processing quality and save time. As shown in FIG. 3, the proposed system includes pre-analysis of sensor data, pre-processing and filtering of sensor data, pre-processing of medical records, and pre-processing of social network content.

센서 데이터의 사전 분석에서, 환자의 생리학적 정보는 웨어러블 생체 의학 및 행동 센서를 사용하여 추출한다. 표 1과 같이 센서와 스마트폰을 사용하는 당뇨병 및 BP 환자들로부터 서로 다른 파라미터가 감지된다.In the pre-analysis of sensor data, patient physiological information is extracted using wearable biomedical and behavioral sensors. As shown in Table 1, different parameters are detected from diabetic and BP patients using sensors and smartphones.

리소스resource IDID 파라미터parameter 설명Explanation

센서

sensor S1S1 혈당Blood sugar 혈당 (mg / dL)Blood sugar (mg/dL) S2S2 체온temperature 환자의 현재 체온Patient's current body temperature S3S3 혈압Blood pressure 수축기 혈압 (mmHg)Systolic blood pressure (mmHg) 이완기 혈압 (mmHg)Diastolic blood pressure (mmHg) S4S4 산소 포화도Oxygen saturation 환자의 SpO₂ 소비량 (mmHg)SpO ₂ consumption by patient (mmHg) S5S5 심박수Heart rate 환자의 심박수 (bpm)Patient's heart rate (bpm) S6S6 ECGECG ECG 센서를 이용한 환자의 심전도Patient's electrocardiogram using ECG sensor S7S7 EEGEEG EEG 센서를 이용한 환자 뇌파 검사Patient EEG test using EEG sensor S8S8 스트레스stress ECG + EEG 패턴을 이용한 환자의 스트레스 계산Calculation of patient stress using ECG + EEG pattern

스마트폰

Smartphone SP1SP1 나이age 생년월일 환자의 나이Date of birth patient's age SP2SP2 키key 환자의 신장을 찾기 위한 검시계An optometrist to find the patient's height SP3SP3 BMIBMI 체질량 지수 (kg / m²)Body mass index (kg/m ² ) SP4SP4 성별gender 환자의 성별 (0/1)Patient's gender (0/1) SP6SP6 활동activity 라이프 스타일: 좌식, 가볍게 활동, 보통 활동, 활성 또는 매우 활동적인Lifestyle: sedentary, lightly active, moderately active, active or very active

병원 의료 기록

Hospital medical records MR1MR1 지단백질 수준Lipoprotein level 저밀도 지단백질 수준 (LDL 콜레스테롤)Low-density lipoprotein levels (LDL cholesterol) 고밀도 지단백질 수준 (HDL 콜레스테롤)High-density lipoprotein levels (HDL cholesterol) MR 2MR 2 헤모글로빈hemoglobin 환자의 글리코 헤모글로빈 (A1c) (%)Patient's glycohemoglobin (A1c) (%) MR 3MR 3 혈당Blood sugar 환자의 혈액 검사Patient's blood test MR 4MR 4 혈청 크레아티닌Serum creatinine 환자의 혈액 검사Patient's blood test MR5MR5 중성 지방Triglycerides 환자의 혈액 검사Patient's blood test MR6MR6 콜레스테롤cholesterol 환자의 혈액 검사Patient's blood test MR7MR7 AST (SGOT)AST (SGOT) 간 손상을 검사하는 혈액 검사Blood tests to check for liver damage MR8MR8 ALT (SGPT)ALT (SGPT) 간 손상을 검사하는 혈액 검사Blood tests to check for liver damage MR9MR9 마약 복용Taking drugs 처방전 목록에서 추출한Extracted from the prescription list MR10MR10 흡연smoking 예 / 아니오 (자체 시험)Yes / No (self test) MR11MR11 음주Drinking 예 / 아니오 (자체 시험)Yes / No (self test) MR12MR12 소화불량Indigestion 예 / 아니오 (자체 시험)Yes / No (self test) MR13MR13 가족 병력Family medical history 예 / 아니오 (자체 시험)Yes / No (self test) MR14MR14 환자의 병력Patient's medical history 예 / 아니오 (자체 시험)Yes / No (self test)

감지된 파라미터는 당뇨병, 비정상적 BP, 그리고 다른 질병의 대부분의 증상들을 포함한다. 또한 환자의 몸에서 다른 파라미터들을 추출한다. 그러나 단순성을 위해 제안된 작업에서는 이러한 파라미터만 언급된다. 사전 분석에서는 다양한 단계를 수행한다. 첫째, 데이터 세트는 쉽게 구문 분석할 수 있도록 쉼표로 구분된 값(CSV) 파일로 변환된다. 이름은 각 속성(파라미터)에 할당되며, 숫자 값과 함께 열의 형태로 표시된다. 그런 다음 센서 데이터의 ID를 실제 센서 이름으로 표시한다(예를 들어, S1은 혈당 ID). 다양한 속성을 다른 목적으로 사용한다. 예를 들어 혈당, 혈압, 심박수, 체질량 지수(BMI), 연령, 성별, 활동 속성을 활용하여 당뇨병 및 BP 환자의 건강 상태를 분류한다. 그러나 연령과 활동에 각각 명시적인 연도와 시간을 사용하는 대신 속성연령을 일, 월, 년으로 나누고, 활동시간을 아침, 오후, 저녁으로 나눈다. 이러한 생성된 기능은 월의 시작인지 끝인지 또는 시작 월과 종료 월 중에 심각한 상태가 발생하는지 여부에 따라 정상적이고 비정상적인 상태의 패턴과 같이 환자 건강에 대한 귀중한 통찰력을 제공한다. 이후 추가 처리를 위해 최종 데이터 세트는 Hadoop 클라우드 환경으로 업로드한다.The parameters detected include diabetes, abnormal BP, and most symptoms of other diseases. It also extracts other parameters from the patient's body. However, only these parameters are mentioned in the proposed work for simplicity. Pre-analysis performs various steps. First, the data set is converted into a comma-separated values (CSV) file for easy parsing. The name is assigned to each attribute (parameter) and is displayed as a column with a numeric value. Then, the ID of the sensor data is displayed as the actual sensor name (for example, S1 is the blood glucose ID). Different properties are used for different purposes. For example, blood sugar, blood pressure, heart rate, body mass index (BMI), age, sex, and activity attributes are used to classify the health status of diabetic and BP patients. However, instead of using explicit years and hours for age and activity respectively, the attribute age is divided into days, months, and years, and activity hours are divided into morning, afternoon, and evening. These generated features provide valuable insights into patient health, such as patterns of normal and abnormal conditions, depending on whether a serious condition occurs during the start or end of the month or the start and end months. The final data set is then uploaded to the Hadoop cloud environment for further processing.

웨어러블 센서를 이용하여 수집한 데이터는 다양한 한계가 있다. 그들은 부정확하고 쓸모없는 많은 정보를 포함하고 있다. 또한 센서 데이터는 노이즈 및 누락값과 같은 신호 인공 산물에 의해 손상되어 분류 성능이 크게 저하된다. 따라서 센서 데이터의 사전 처리 및 필터링에서, 분석 전에 데이터를 미리 처리하고 필터링한다. 불일치와 노이즈를 제거하기 위해 데이터를 필터링한다. ASCII 문자를 제거하여 데이터를 정리한다. 데이터에서 그러한 소음을 제거하기 위해 칼만(Kalman) 필터링이라고 하는 잘 알려진 필터링 방식을 사용한다. 또한 ReplaceMissingValues라는 비지도 필터는 데이터 세트에 누락된 모든 숫자 값을 사용 가능한 데이터의 수단과 모드로 대체하기 위해 사용된다. 무인 특성은 RemoveUseless 필터라는 비지도 필터를 사용하여 최대 90%의 분산을 통해 제거된다. 그런 다음 숫자 값은 정규화 필터를 사용하여 정규화되며, 모든 분류에 대해 0과 1 사이로 제한된다. 이러한 단계를 거친 후 EmEditor를 사용하여 데이터 세트를 n개의 데이터 파일로 나누고, 추가 처리를 위해 Hadoop 클라우드 환경으로 업로드한다.Data collected using wearable sensors has various limitations. They contain a lot of inaccurate and useless information. In addition, sensor data is damaged by signal artifacts such as noise and missing values, which significantly degrades classification performance. Therefore, in the pre-processing and filtering of sensor data, the data is pre-processed and filtered before analysis. Filter the data to remove inconsistencies and noise. Clean up the data by removing ASCII characters. To remove such noise from the data, we use a well-known filtering method called Kalman filtering. In addition, an unsupervised filter called ReplaceMissingValues is used to replace all missing numeric values in the data set with the means and modes of available data. Unattended features are removed with a variance of up to 90% using an unsupervised filter called the RemoveUseless filter. The numeric values are then normalized using a normalization filter and limited to between 0 and 1 for all classifications. After these steps, EmEditor is used to divide the data set into n data files and upload it to the Hadoop cloud environment for further processing.

의료기록(MR)의 사전처리에서, MR의 데이터는 디지털 형식의 전체 환자 기록으로 구성된다. 이 자료에는 실험실 테스트, 자가 검사 답변, 복용한 약물과 같이 환자의 건강을 설명하는 서로 다른 의료 데이터가 포함되어 있다. 실험실 테스트는 기준 값의 측면에서 환자의 건강 상태를 판단하는 데 활용할 수 있는 의료기기의 데이터다. 또 환자의 건강 상태에 대한 평가는 환자의 병력과 가족병력에 따라 달라질 수 있다. 자가진단 데이터는 소화불량 기간 등 개인 의료기기를 통해 추출할 수 없는 자료와 음주 및 흡연 습관으로 구성된다. 의약품 데이터에는 진단을 위해 처방된 약물에 대한 정보가 포함되어 있다. MR의 특성 중 일부는 환자 분류에 사용될 수 있다. 따라서 데이터 분석을 위해 MR의 각 속성에 ID와 기준 값(0, 1, 2)을 할당한다. 기준치 0, 1, 2는 환자의 건강 상태(정상, 당뇨병 전, 당뇨병)를 나타낸다. 또한 센서 기반 생성 데이터의 한계를 극복하기 위해 MR 데이터를 사용하기도 한다. 예를 들어, 누락된 값을 데이터 세트의 현재 MR 속성 값으로 대체한다.In the pre-processing of the medical record (MR), the data in the MR consists of the entire patient record in digital format. This data contains different medical data describing the patient's health, such as laboratory tests, self-test answers, and medications taken. Laboratory tests are data from medical devices that can be used to judge a patient's health status in terms of a reference value. Also, the evaluation of the patient's health status may vary depending on the patient's medical history and family medical history. Self-diagnosis data consists of data that cannot be extracted through personal medical devices such as indigestion period, and drinking and smoking habits. Drug data contains information about drugs prescribed for diagnosis. Some of the characteristics of MR can be used for patient classification. Therefore, ID and reference value (0, 1, 2) are assigned to each attribute of MR for data analysis. Reference values 0, 1, and 2 represent the patient's health status (normal, pre-diabetes, diabetes). Also, MR data is used to overcome the limitations of sensor-based generated data. For example, it replaces the missing value with the current MR attribute value from the data set.

소셜 네트워크 콘텐츠의 사전처리는 텍스트 분류 전 중요한 작업이다. 그것은 코퍼스(말뭉치) 데이터에서 발생하는 소음을 줄이고, 데이터를 ML 분류기에 유용한 표현으로 변환한다. 제안된 시스템은 소셜 네트워크 컨텐츠와 약물 리뷰를 수집하여 HDFS에 저장한다. 다음 단계를 적용하여 데이터를 구조화된 형태로 변환한다. 또한 이러한 단계는 유용한 데이터를 제거하므로 특징과 의견 단어를 쉽게 추출하는 데 도움이 된다.Pre-processing of social network content is an important task before text classification. It reduces noise arising from corpus (corpus) data and transforms the data into useful representations for ML classifiers. The proposed system collects social network content and drug reviews and stores them in HDFS. Apply the following steps to transform the data into a structured form. In addition, these steps remove useful data, helping to extract features and opinion words easily.

전치사(to, in, and of), 모든 관사(a, an, and the ), 기호(#, @ 등), 코퍼스 데이터의 URL과 같은 단어는 문서의 의미를 방해하지 않는다. 그러나 텍스트 분류의 정확성은 감소한다. 따라서 텍스트의 노이즈를 줄이기 위해서는 그것들을 제거하는 것이 필수적이다. 레인보우(Rainbow)라는 잘 알려진 방법을 사용하여 이 내용을 삭제한다.Words such as prepositions (to, in, and of), all articles (a, an, and the ), symbols (#, @, etc.), and URLs of corpus data do not interfere with the meaning of the document. However, the accuracy of text classification decreases. Therefore, it is essential to remove them to reduce the noise in the text. Delete this content using a well-known method called Rainbow.

토큰화는 흰 공간과 구분자를 제거하여 코퍼스의 복잡한 텍스트를 작은 용어나 토큰으로 구분한다. 일반적으로 흰색 공간과 구분자는 복잡한 텍스트에서 발생한다. 따라서 n-gram 토큰라이저를 적용하여 흰 공간 구분자를 삭제한다. 그런 다음 출력은 추출된 단어의 음성(PoS) 태깅 및 표제어와 같은 추가 분석을 위해 저장된다.Tokenization separates the corpus complex text into small terms or tokens by removing white spaces and separators. Usually white spaces and separators occur in complex text. Therefore, the white space separator is deleted by applying an n-gram tokenizer. The output is then saved for further analysis such as speech (PoS) tagging and headwords of the extracted words.

PoS 태깅은 텍스트에서 단어를 정의한다. 코퍼스 텍스트를 문장으로 나눈 다음, POS 태깅을 위해 CoreNLP(Stanford Core Natural Lanquage Processing)를 사용한다. 태그를 붙인 후, 모든 문장은 명사와 동사가 있는 완전한 절이 있는 것으로 확인된다. PoS tagging defines words in text. Break the corpus text into sentences, then use Stanford Core Natural Lanquage Processing (CoreNLP) for POS tagging. After tagging, every sentence is identified as having a complete clause with a noun and a verb.

어간 추출(stemming) 및 표제어 추출(lemmatization): 스테밍은 코퍼스 텍스트에 있는 단어들을 그들만의 기본적인 형태로 변환시킨다. 이 시스템은 스테밍을 위해 접미사 드로핑 알고리즘을 적용한다. 어휘는 본문에서 사용되는 단어의 어휘를 표현한다. 어휘화 후, 시스템은 각 단어의 어휘 정보를 쉽게 얻는다. 예를 들어, 혈당(blood sugar)은 혈당(blood glucose)과 관련이 있다. 따라서, 어간과 표제어 단어는 추가 처리를 위해 활용된다. Stemming and lemmatization: Stemming transforms words in corpus text into their own basic form. This system applies the suffix dropping algorithm for stemming. Vocabulary expresses the vocabulary of words used in the text. After lexicalization, the system easily obtains vocabulary information for each word. For example, blood sugar is related to blood glucose. Thus, stems and headwords are utilized for further processing.

특징 변환에서, 환자는 분류기 결과에 영향을 미치는 SNS에서 특이한 단어(예를 들어, depressssssed)를 사용한다. 따라서 두 번 이상 나타나는 일련의 문자를 일반적인 단어로 변환한다(예를 들어, depressssed는 depressed(우울해지다)가 된다).In feature transformation, the patient uses a peculiar word (e.g., depressssssed) in SNS that affects the classifier result. Thus, it converts a sequence of characters that appear more than once into a normal word (for example, depressssed becomes depressed).

특징(형상) 극성 식별 및 문서 라벨링에서, 제안된 시스템은 소셜 네트워크에 게시된 내용을 사용하여 환자의 스트레스와 우울증을 감지한다. 또한, 당뇨병 치료제의 효율성과 부작용에 대한 그들의 의견을 파악하기 위해 약물 검토에 관한 여러 가지 업무를 수행한다. 앞에서 언급한 두 가지 과제에 대해 정서 분석 접근법을 사용한다. 따라서 정서 구분을 위한 특징 극성과 문서 라벨링을 찾는 것이 중요하다. 소셜 네트워크와 웹 페이지 콘텐츠를 사전 처리한 후, SWN(SentiWordNet)을 사용하여 특징 의견 단어의 극성을 파악한다. 그런 다음 특징 극성의 결과가 누적되어 전체 문서의 극성을 찾는다. SWN은 워드넷의 각 동기화 세트를 양, 음, 목표의 세 가지 수치와 연결하는 어휘 자원이다. 그러나 SWN은 각 단어에 대한 감각이 포함되어 명사, 동사, 형용사 또는 부사를 나타낸다. 따라서 WSD(Word Sense Disambigation)는 각 단어에 필요한 카테고리 감각을 추출하여 이 문제를 처리하는 데 사용된다. 또한, SWN이 그 의미를 전혀 가지고 있지 않을 경우 입력 단어에 0 값을 할당한다. WSD 후에는 의견 단어의 동일한 감각에 대한 SWN 점수를 추출한 후 다음의 수학식을 사용하여 각 특징의 극성을 계산한다.In feature (shape) polarity identification and document labeling, the proposed system detects patient stress and depression using content posted on social networks. They also perform a number of tasks on drug review to get their opinions on the effectiveness and side effects of diabetes treatments. We use the sentiment analysis approach for the two tasks mentioned earlier. Therefore, it is important to find the feature polarity and document labeling to distinguish sentiment. After pre-processing the social network and web page content, we use SentiWordNet (SWN) to determine the polarity of the feature opinion word. Then, the result of the feature polarity is accumulated to find the polarity of the entire document. SWN is a vocabulary resource that connects each synchronization set of WordNet with three numbers: positive, negative, and target. However, SWN represents a noun, verb, adjective, or adverb with a sense of each word. Therefore, Word Sense Disambigation (WSD) is used to deal with this problem by extracting the category sense required for each word. In addition, if the SWN does not have any meaning, a value of 0 is assigned to the input word. After WSD, the SWN score for the same sense of the opinion word is extracted, and the polarity of each feature is calculated using the following equation.

(1)

(One)

(2)

(3)

여기서 Pos_score SWN_w , Neg_score SWN_w , Neu_score SWN_w 는 각각 특징(형상)의 양, 음 및 중립 점수를 나타낸다. 이 점수는 개별 단어 w에 대한 SWN의 산술수단에 의해 계산된다. 만약 Pos_score (F_i) > Neg_score (F_i )와 Neu_score (F_i)이면 시스템은 특징 극성을 양으로 간주한다. 이와는 대조적으로, Neg_score (F_i ) > Pos_score (F_i )와 Neu_score (F_i )이면 특징 극성은 음이다. 마지막으로, Neu_score (F_i ) > Pos_score (F_i )와 Neg_score (F_i )이면 중립으로 간주된다.Here, Pos _score SWN _w , Neg _score SWN _w , and Neu _score SWN _w represent positive, negative, and neutral scores of the feature (shape), respectively. This score is calculated by means of SWN arithmetic for each word w. If Pos _score (F _i )> Neg _score (F _i ) and Neu _score (F _i ), the system regards the feature polarity as positive. In contrast, if Neg _score (F _i )> Pos _score (F _i ) and Neu _score (F _i ), the feature polarity is negative. Finally, if Neu _score (F _i )> Pos _score (F _i ) and Neg _score (F _i ), it is considered neutral.

도 4는 본 발명의 일 실시예에 따른 임베딩 및 온톨로지 기반 특징 추출을 설명하기 위한 도면이다. 4 is a diagram for explaining embedding and ontology-based feature extraction according to an embodiment of the present invention.

텍스트 데이터에서 워드 임베딩 및 온톨로지 기반 특징 추출에 있어서, 워드 임베딩은 고급 분석을 위해 문서의 단어를 숫자 값으로 인코딩하는 단어 표현 방식이다. 단어 표현에는 두 가지 잘 알려진 접근법을 적용할 수 있다. 하나는 핫 워드 인코딩이고 하나는 임베딩이다. 이러한 작업에서는 단어를 숫자 값으로 나타내기 위해 워드 임베딩 접근법을 적용한다. 차원성, d를 설정한 다음 d차원 값을 사용하여 단어를 표현한다. 예를 들어 d가 3으로 설정되어 있을 때 "metformin"은 {0.3, 0.6, 0.8}로 표현된다. 이 방법은 차원을 크게 줄인다. 그러나 임베딩 접근법은 행렬의 단어들 사이의 관계를 피한다. 따라서, Word2vec라고 불리는 신경망 기반의 워드 임베딩 모델을 단어 표현에 사용한다. Word2vec는 연속된 bag-of-words(CBoW)와 skip-gram 모델이라는 두 가지 아키텍처를 포함하고 있다. CBoW 모델은 단어 예측을 위해 단어의 주변 문맥을 사용한다. skip-gram 모델은 도 4와 같이 현재 단어를 사용하여 주변 상황을 예측한다. In word embedding and ontology-based feature extraction from text data, word embedding is a word expression method in which words in a document are encoded as numeric values for advanced analysis. Two well-known approaches can be applied to word expression. One is hot word encoding and one is embedding. In this work, we apply a word embedding approach to represent words as numeric values. After setting the dimensionality, d, the word is expressed using the d-dimensional value. For example, when d is set to 3, "metformin" is expressed as {0.3, 0.6, 0.8}. This method greatly reduces the dimensions. However, the embedding approach avoids the relationship between words in the matrix. Therefore, a neural network-based word embedding model called Word2vec is used for word expression. Word2vec includes two architectures: continuous bag-of-words (CBoW) and skip-gram model. The CBoW model uses the context around the word for word prediction. The skip-gram model predicts the surrounding situation by using the current word as shown in FIG. 4.

데이터 세트에 200차원 벡터를 사용하여 Word2vec의 skip-gram 모델을 교육했다. 그것은 단어들 사이의 관계를 식별하고, 입력 단어의 이웃 단어들을 예측하는 것을 돕는다. 이 Word2vec 모델은 형상들 사이의 연관성을 탐지하기 위해 ML 분류기를 훈련시키는데 이용된다. 일반적으로 ML 분류기는 특정 도메인에 따라 특징의 기본 의미론을 놓치게 된다. 온톨로지는 경우에 따라 주어진 도메인의 의미론을 나타낼 수 있다.Word2vec's skip-gram model was trained using 200-dimensional vectors in the data set. It identifies the relationship between words and helps to predict the neighboring words of the input word. This Word2vec model is used to train an ML classifier to detect associations between shapes. In general, ML classifiers miss the basic semantics of features depending on the specific domain. Ontology may represent the semantics of a given domain in some cases.

온톨로지는 특정 영역에서 개념과 그들의 관계에 대한 의미론적 지식을 제공하는 것을 목표로 한다. 온라인에서 이용할 수 있는 생물의학 온톨로지는 당뇨병, 우울증, 고혈압, 약물, 그리고 음식에 관한 다양한 주제를 다룬다. 본 발명에서는 특정 도메인 온톨로지를 사용하는데, 여기서 온톨로지의 각 클래스는 도메인의 개념이나 특징이며, 그 속성은 특징 간의 관계다. 이것은 현재 특징에 대한 지식을 공식화하는 일반적인 방법인 국립 생물의학 온톨로지 바이오 포탈 센터에서 이용할 수 있는 온라인 생물의학 온톨로지의 기본적인 표현이다. Ontology aims to provide semantic knowledge of concepts and their relationships in a specific domain. The biomedical ontology, available online, covers a variety of topics related to diabetes, depression, high blood pressure, drugs, and food. In the present invention, a specific domain ontology is used, in which each class of the ontology is a concept or characteristic of a domain, and the attribute is a relationship between characteristics. This is a basic expression of the online biomedical ontology available at the National Biomedical Ontology Bio Portal Center, a common way to formulate knowledge of current features.

온톨로지 및 약어Ontology and abbreviations 정의Justice 클래스 수Number of classes 특성 수Number of features 개인 수Number of individuals 영양학 온톨로지 (ONS)Nutritional Ontology (ONS) 복합 영양 연구에 대한 설명을 제공합니다.Provides an explanation of complex nutrition research. 34423442 6666 104104 BioMedBridges 당뇨병 온톨로지 (DIAB)BioMedBridges Diabetes Ontology (DIAB) 텍스트 마이닝을위한 당뇨병 표현형 간의 관계를 나타냅니다.Represents the relationship between diabetes phenotypes for text mining. 375375 44 00 당뇨병 치료 온톨로지 (DMTO)Diabetes Treatment Ontology (DMTO) 당뇨병 치료를 위한 상호 운용성을 제공합니다.It provides interoperability for diabetes treatment. 1070010700 315315 6363 인간 질병 온톨로지 (DOID)Human Disease Ontology (DOID) 그것은 드문 질병의 개념을 나타냅니다.It represents the concept of a rare disease. 1269412694 1515 00 약물 목표 온톨로지 (DTO)Drug Target Ontology (DTO) 그것은 약물 목표 데이터의 분류에 대한 정보를 제공합니다.It provides information on the classification of drug target data. 1007510075 00 00 FHIR 및 SSN 기반 유형 1 당뇨병 온톨로지 (FASTO)Type 1 Diabetes Ontology (FASTO) based on FHIR and SSN 당뇨병 환자를 위한 인슐린 관리 정보를 제공합니다.Provides insulin management information for diabetics. 95779577 822822 460460

표 2와 같이 개념과 관계가 많은 영양학 연구용 온톨로지(ONS), BioMedBridges 당뇨 온톨로지(DIAB), 당뇨병 Mellitus 치료 온톨로지(DMTO), 인간 질병 온톨로지(DOID), 약물 표적 온톨로지(DTO), 패스트 헬스케어 상호운용성 자원(FHIR), 의미 센서 네트워크(SSN) 기반 타입1의 당뇨병 온톨로지(FASTO)의 최신판을 활용한다. 이러한 온톨로지는 특이한 단어의 의미론적 뜻을 이해하는 신경망 모델에 대한 추가 정보를 제공한다. 또한 워드 임베딩과 텍스트 분류에서 워드 레벨 의미론에도 영향을 미친다.As shown in Table 2, Ontology for nutrition research (ONS), BioMedBridges Diabetes Ontology (DIAB), Diabetes Mellitus Treatment Ontology (DMTO), Human Disease Ontology (DOID), Drug Target Ontology (DTO), Fast Healthcare Interoperability It utilizes the latest edition of Type 1 Diabetes Ontology (FASTO) based on resources (FHIR) and semantic sensor network (SSN). This ontology provides additional information for neural network models that understand the semantic meaning of unusual words. It also affects word-level semantics in word embedding and text classification.

시스템은 모든 온톨로지 정보를 검색하고 분류기를 위한 단어 임베딩과 통합하려고 시도한다. 일반적으로 BOW(Bag-of-Words) 모델은 특징에 사용된다. 그러므로, 각각의 온톨로지를 BOW 라고 생각한다. 온톨로지로부터 정보를 추출하기 위해, 용어 주파수(TF), 용어 주파수 및 TF-IDF(Inverse Document Frequency)라고 하는 잘 알려진 통계 방법을 사용한다. 여기서는 온톨로지의 각 개념이나 특징을 용어로서, 온톨로지를 문서로서 고찰한다. 따라서 TF는 온톨로지에서 발견되는 용어(주변어)이다. TF는 다음 수학식에서 수학적으로 정의된다.The system retrieves all ontology information and attempts to integrate it with the word embedding for the classifier. Typically, the Bag-of-Words (BOW) model is used for features. Therefore, we think of each ontology as a BOW. In order to extract information from the ontology, a well-known statistical method called term frequency (TF), term frequency and TF-IDF (Inverse Document Frequency) is used. Here, we consider each concept or feature of ontology as a term, and ontology as a document. Therefore, TF is a term (ambient word) found in ontology. TF is mathematically defined in the following equation.

(4)

TF(Term,Onto) 값이 0보다 클 경우 이 단계를 반복하여 특정 개념을 추출한다. TF-IDF는 정보 추출에 대한 예시용어를 선택한다. IDF는 주로 모든 온톨로지(예를 들어, 환자, 질병, 병원 등)에 나타나는 단어의 의미를 감소시킨다. 용어가 더 많은 온톨로지 또는 단일 온톨로지에서 더 많은 개념에서 발생하는 경우, 그것은 정규 용어라는 것을 의미하며 정보 추출에 필요한 용어가 아닐 수 있다. 따라서 입력 단어의 로그 함수의 결과는 0으로 감소할 것이다. 이는 TF-IDF의 가치가 이 용어에 비해 작다는 것을 보여준다. IDF의 통계적 설명은 다음 수학식에 나타나 있다.If the TF(Term, Onto) value is greater than 0, repeat this step to extract a specific concept. TF-IDF selects exemplary terms for information extraction. IDF mainly reduces the meaning of words that appear in all ontology (eg, patient, disease, hospital, etc.). When a term occurs in more ontologies or more concepts in a single ontology, it means that it is a regular term and may not be a term necessary for information extraction. Therefore, the result of the logarithmic function of the input word will be reduced to zero. This shows that the value of TF-IDF is small compared to this term. The statistical explanation of IDF is shown in the following equation.

(5)

여기서

는 데이터베이스의 전체 온톨로지 수 또는 온톨로지 내의 총 개념 수를 표시한다(예를 들어,

및 |{Onto∈Ontos:Term ∈Onto}|는 용어가 나타나는 온톨로지 내의 온톨로지 또는 개념의 수이다). 다음 TF-IDF 수학식을 사용하여 공통 용어를 제거한다.here

Indicates the total number of ontology in the database or the total number of concepts in the ontology (for example,

And |{Onto∈Ontos:Term ∈Onto}| is the number of ontology or concepts in the ontology in which the term appears). The common term is removed using the following TF-IDF equation.

(6)

TF-IDF 결과는 온톨로지 코퍼스에서 온톨로지 기능이 온톨로지에게 얼마나 필수적인지를 보여주거나 온톨로지에서의 개념에 온톨로지 기능이 얼마나 중요한지를 보여준다. The TF-IDF result shows how essential the ontology function is to ontology in the ontology corpus or how important the ontology function is to the concept in the ontology.

도 5는 본 발명의 일 실시예에 따른 LSTM 기반 분류 및 예측을 설명하기 위한 도면이다. 5 is a diagram for describing LSTM-based classification and prediction according to an embodiment of the present invention.

TF-IDF 용어(X=2)의 값을 바탕으로 상위 개념을 추출할 수 있다. 도 6과 같이 시스템이 당뇨병과 관련된 텍스트를 얻는다고 가정하자. 도 5에서 정보 추출에 대한 모호한 단어는 굵은 활자(예를 들어, 고지혈증)이다. 일단 이러한 단어에 대한 의미 정보가 확인되면, 시스템은 온톨로지로부터 그 특정 개념을 추출하여 워드 임베딩과 ML 분류기에 대한 추가 정보를 제공한다(예를 들어, 고지혈증은 질병의 합병증이다).A higher concept can be extracted based on the value of the TF-IDF term (X=2). Assume that the system obtains text related to diabetes as shown in FIG. 6. In FIG. 5, an ambiguous word for information extraction is in bold type (eg, hyperlipidemia). Once semantic information for these words is identified, the system extracts the specific concept from the ontology and provides additional information for word embedding and ML classifiers (eg, hyperlipidemia is a complication of disease).

웨어러블 센서 데이터에서 추출하는 특성에 있어서, 데이터의 엄청난 크기는 센서 기반 의료 모니터링 시스템과 관련된 또 다른 주요 이슈다. 대량으로 추출한 환자의 몸에서 나온 원시 데이터는 데이터 처리에 부담이 된다. 유용한 정보를 잃지 않고 데이터의 크기를 줄이는 것이 중요하다. 데이터 세트에는 당뇨병과 BP 환자에 대한 많은 속성이 포함되어 있다. 그러나 환자 분류에는 모든 속성이 필요하지 않을 것이다. 불필요한 속성은 시간이 많이 소요되고 분류의 정확성이 떨어진다. 잠자리 알고리즘과 재귀(recursive) 형상 제거와 같은 다양한 방법이 특징 선택에 사용된다. IG 방식을 사용하는데, 이는 노이즈 및 무관한 특징을 줄임으로써 분류기에 영향을 미친다.In terms of the characteristics extracted from wearable sensor data, the massive size of the data is another major issue related to sensor-based medical monitoring systems. Raw data from the patient's body extracted in large quantities is a burden on data processing. It is important to reduce the size of the data without losing useful information. The data set contains many attributes for patients with diabetes and BP. However, not all attributes will be required for patient classification. Unnecessary attributes are time consuming and the accuracy of classification is poor. Various methods such as dragonfly algorithm and recursive shape removal are used for feature selection. It uses the IG method, which affects the classifier by reducing noise and irrelevant features.

정보 이득은 속성 상호작용을 보지 않고 클래스의 변수와 관련된 정보의 기여를 바탕으로 형상을 선택한다. 데이터 세트의 모든 속성이나 특징에는 중요성이 있으며, 그 중요성에 기초하여 시스템은 특정 문제에 대해 배울 수 있다. WEKA(Waikato Environment for Knowledge Analysis)를 활용하여 정보 이득을 계산한다. 데이터 처리를 위한 다양한 방법을 포함하고 있다. 당뇨병 및 BP 데이터 세트에 대한 평가자로 정보 이득 필터 "infoGainAtrribeVal"을 적용하여 결과를 얻는다. 그러나 IG는 숫자 데이터 집합에 사용할 수 없다. 따라서 IG를 사용하기 전에 숫자 데이터를 명목 데이터로 변환하는 것이 중요하다. 속성 값이 식별되면, IG 측정은 훈련 데이터 세트의 엔트로피 감소와 연계된다. 이 접근법은 분류에 따라 IG를 계산하여 속성값을 찾는다. 제안된 IG 방법은 시스템 불확실성을 측정하기 위해 엔트로피를 이용하고 이전의 엔트로피와 이후의 엔트로피 차이를 발견한다. 그것은 수학식 7과 같이 B가 제공하는 A에 대한 추가 정보의 양을 명시한다.The information gain selects shape based on the contribution of information related to the variables of the class without seeing the attribute interaction. Every attribute or feature of a data set has importance, and based on that importance the system can learn about a particular problem. Information gain is calculated using the Waikato Environment for Knowledge Analysis (WEKA). It includes various methods for processing data. The result is obtained by applying the information gain filter "infoGainAtrribeVal" as an evaluator for the diabetes and BP data set. However, IG cannot be used for numeric data sets. Therefore, it is important to convert numeric data into nominal data before using IG. Once the attribute value is identified, the IG measurement is associated with a reduction in entropy of the training data set. This approach finds the attribute value by calculating the IG according to the classification. The proposed IG method uses entropy to measure system uncertainty and finds the difference between the previous and subsequent entropy. It specifies the amount of additional information about A provided by B, as shown in Equation 7.

(7)

여기서 A와 B는 별개의 변수이다. A는 특징이며, 이전의 엔트로피는 수학식 8를 사용하여 측정할 수 있다.Here, A and B are separate variables. A is a feature, and the previous entropy can be measured using Equation 8.

(8)

여기서

는

의 이산 값에 대한 사전 확률을 나타낸다. A의 조건부 엔트로피는, 이후의 엔트로피 B가 주어진 후에, 수학식 9와 10에서와 같이 정의될 수 있다.here

Is

Represents the prior probability for the discrete values of. The conditional entropy of A can be defined as in Equations 9 and 10 after the entropy B is given.

(9)

(10)

정보 이득은 수학식 11과 같이 수학식 8과 10을 수학식 7에 넣어 계산할 수 있다.The information gain can be calculated by putting Equations 8 and 10 into Equation 7 as in Equation 11.

(11)

주성분 분석Principal component analysis

데이터 세트의 차원수는 사전 처리 및 추가 속성 생성 후에 증가한다. 이는 분류 정확도, 적합도 및 시간 복잡성의 감소와 같은 문제를 야기한다. 따라서, 차원 감소에 대한 통계적 접근방식인 주성분 분석(PCA)을 사용한다. p차원 X 데이터를 가장 적은 손실로 q차원 Y 데이터로 변환한다. PCA의 주된 목적은 가장 높은 분산을 가진 투영 벡터를 찾는 것이다. 고유한 값은 달성된 벡터를 나타내기 위해 사용된다. 동일한 평면의 벡터 B에 있는 투영 벡터 A는 수학식 12를 사용하여 도해할 수 있다.The number of dimensions of the data set increases after pre-processing and additional attribute creation. This leads to problems such as reduction in classification accuracy, fit and time complexity. Therefore, we use principal component analysis (PCA), a statistical approach to dimensionality reduction. Convert p-dimensional X data to q-dimensional Y data with the least loss. The main purpose of PCA is to find the projection vector with the highest variance. Unique values are used to represent the vector achieved. The projection vector A in the vector B of the same plane can be illustrated using Equation 12.

(12)

투영 벡터를 구한 후 고유값으로의 고유 벡터 변환이 데이터로부터 공분산을 얻는데 활용된다. 고유값은 내림차순으로 배열되어 있다. 그런 다음 배열된 v 고유값에 해당하는 u 고유 벡터를 사용하여 행렬 열을 정렬한다. 이 접근법을 사용하여 W 투영 매트릭스에 의해 제공되는 최상의 투영을 달성한다. 달성된 프로젝트 매트릭스, W

및 데이터 X를 곱하여 수학식 13과 같이 감소된 데이터 세트를 구한다.After obtaining the projection vector, the transformation of the eigenvector to the eigenvalue is used to obtain the covariance from the data. Eigenvalues are arranged in descending order. Then, the matrix columns are sorted using the u eigenvectors corresponding to the arranged v eigenvalues. This approach is used to achieve the best projection provided by the W projection matrix. Achieved project matrix, W

And data X to obtain a reduced data set as shown in Equation 13.

Y = W

X (13)Y = W

X (13)

Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 부작용 예측에 있어서, CNN, MLP, SVM, 로지스틱 회귀(logistic regression), 의사결정 트리 및 KNN과 같은 기계 학습 접근방식은 분류, 특징 선택, 데이터 정규화 및 통계 분석에 유용한 알고리즘이다. 이러한 접근방식은 지도 및 비지도 방법을 사용하여 작업을 수행한다. 그러나 데이터 크기가 크면 시간이 많이 걸리고 성능이 저하된다. 따라서 제안된 의료 모니터링 시스템에서는 환자의 당뇨병, BP, 정신 건강, 당뇨병 약물의 부작용 등을 분류하는 데 Bi-LSTM를 사용한다. 추천 시스템은 당뇨병, 고혈압, 스트레스, 우울증, 당뇨병 약물의 심한 부작용 등에 의해 활성화된다. 도 5는 당뇨병, BP, 정신건강, 약물 부작용의 분류를 위한 Bi-LSTM의 구조를 보여준다. 그리드 검색 최적화 알고리즘을 적용하여 LSTM 모델의 하이퍼 파라미터에 대한 최적의 값을 식별한다. 하이퍼 파라미터 드롭아웃 속도, 에폭(epochs), 배치 크기 및 학습 속도에 대해 선택된 최적 값은 각각 0.3, 30, 32, 0.001이다. 그림 6과 같이, LSTM은 네 가지 다른 유형의 데이터 세트를 분류하는 데 사용된다. 당뇨병 분류를 위해, Pima Indians 당뇨병 데이터 세트는 UCI 기계 학습 저장소에서 수집되었다. Pima Indians 데이터 세트에는 768명의 환자의 기록이 들어 있는데, 이 중 268명이 당뇨 양성 반응을 보인 반면 500명은 정상 검사를 받았다. 데이터 집합에는 8개의 입력 속성이 있다. 그러나 당뇨병 분류 모형을 훈련하는 데는 6가지 속성만 사용된다. 이러한 속성은 연령, BMI, BP, 혈장 포도당, 당뇨병 혈통 기능, 2시간 혈청 인슐린 등이다. 집중 치료 II(MIMIC-II) 데이터베이스의 PhysioNet 다중 파라미터 지능형 모니터링 데이터 세트를 사용하여 BP 분류 모델을 교육했다. 이 데이터 세트는 BP와 심박수(HR)를 포함한 다수의 활력징후 샘플로 구성된다. 약물 부작용 예측에 대한 데이터 세트는 UCI 저장소에서 수집했다. 이 데이터 집합은 6개의 속성으로 구성되어 있다. 그러나 제안된 작업에서는 두 가지 속성(약물과 환자 리뷰의 이름)만 고려된다. Machine learning approaches such as CNN, MLP, SVM, logistic regression, decision tree, and KNN in Bi-LSTM-based diabetes and BP classification and drug side effects prediction are used for classification, feature selection, data normalization and statistical analysis. It is a useful algorithm. This approach does the job using supervised and unsupervised methods. However, if the data size is large, it takes a lot of time and decreases performance. Therefore, the proposed medical monitoring system uses Bi-LSTM to classify patients' diabetes, BP, mental health, and side effects of diabetic drugs. The recommendation system is activated by diabetes, high blood pressure, stress, depression, and severe side effects of diabetes medications. 5 shows the structure of Bi-LSTM for classification of diabetes, BP, mental health, and drug side effects. The grid search optimization algorithm is applied to identify the optimal value for the hyperparameter of the LSTM model. The optimal values chosen for the hyperparameter dropout rate, epochs, batch size and learning rate are 0.3, 30, 32 and 0.001, respectively. As shown in Figure 6, LSTMs are used to classify four different types of data sets. For diabetes classification, the Pima Indians diabetes data set was collected from the UCI machine learning repository. The Pima Indians data set contains records of 768 patients, of which 268 tested positive for diabetes while 500 were tested normally. There are 8 input attributes in the data set. However, only six attributes are used to train the diabetes classification model. These attributes are age, BMI, BP, plasma glucose, diabetic lineage function, and 2-hour serum insulin. The BP classification model was trained using the PhysioNet multi-parameter intelligent monitoring data set from the Intensive Care II (MIMIC-II) database. This data set consists of a number of samples of vital signs, including BP and heart rate (HR). Data sets for predicting drug adverse events were collected from the UCI repository. This data set consists of six attributes. However, in the proposed work, only two attributes are considered: name of drug and patient review.

LSTM은 메모리 셀(

), 입력 게이트(

), 포겟 게이트(

), 현재 메모리 셀(

), 출력 게이트(

)의 5가지 주요 구성요소로 구성된 RNN의 일종이다. 이러한 구성 요소는 이전 데이터의 업데이트 및 사용을 제어한다. LSTM 출력은 다음 수학식을 사용하여 계산할 수 있다.LSTM is a memory cell (

), input gate (

), Forget Gate (

), current memory cell (

), output gate (

) Is a kind of RNN composed of five major components. These components control the update and use of previous data. The LSTM output can be calculated using the following equation.

(14)

(15)

(16)

(17)

(18)

여기서 X^t, w_z 및 w_h, 및 b는 각각 LSTM의 입력, 중량 매트릭스 및 바이어스 벡터이며, tanh(.)와 σ(.)는 각각 쌍곡선 접선 함수 및 S자형 함수다. LSTM의 최종 출력은 다음 수학식을 사용하여 계산할 수 있다.Where X ^t , w _z and w _h , and b are the input, weight matrix and bias vector of LSTM, respectively, and tanh(.) and σ(.) are the hyperbolic tangent function and S-shaped function, respectively. The final output of the LSTM can be calculated using the following equation.

(19)

여기서

는 출력 게이트와 입력 셀 상태 사이의 features-wise 곱셈이다. LSTM 출력은 확률론적 출력(0, 1, 2)을 식별하기 위해 소프트맥스 기능과 연결된다. 여기서 0, 1, 2는 환자가 각각 정상, 당뇨병 또는 당뇨병임을 보여준다. 소프트맥스 함수는 다음 수학식을 사용하여 계산할 수 있다.here

Is the features-wise multiplication between the output gate and input cell states. The LSTM output is connected to the softmax function to identify the probabilistic output (0, 1, 2). Here, 0, 1 and 2 show that the patient is normal, diabetic or diabetic, respectively. The softmax function can be calculated using the following equation.

(20)

여기서 c와 x^st 는 각각 특징 카테고리 및 타임 스텝 k의 입력값이다. 사전 처리, 특징 추출, 차원수 감소 및 워드 임베딩 후 환자의 생리적 정보와 약물 리뷰를 나타내는 일련의 입력(특징 및 워드 벡터)이 Bi-LSTM 계층으로 공급된다. 우리는 LSTM^a , LSTM^b , LSTM^c , LSTM^d 의 네 가지 Bi-LSTM 기반 분류기 모델을 개발했다.Where c and x ^st are the input values of the feature category and time step k, respectively. After pre-processing, feature extraction, dimensionality reduction, and word embedding, a series of inputs (features and word vectors) representing patient physiological information and drug reviews are fed into the Bi-LSTM layer. We have developed four Bi-LSTM based classifier models ^: LSTM ^a , LSTM ^b , LSTM ^c , and LSTM ^d .

본 발명에서는, 웨어러블 기기와 스마트 폰이 환자의 개인적인 생리적 정보를 수집하는데 활용된다. 수집된 데이터는 사전처리되고 구조적 형태로 후속 처리하도록 변환된다. 당뇨병 분류에는 나이, 가족, 성별, 활동, BMI, 혈압, 혈당 기능이 사용된다. 각 타임스텝, t, 특성 Xt를 LSTM^a 단위로 입력하여 당뇨병 등급 {정상, 당뇨병 전, 당뇨병}을 예측한다. In the present invention, a wearable device and a smart phone are used to collect personal physiological information of a patient. The collected data is pre-processed and transformed for further processing into a structured form. Diabetes classification uses age, family, sex, activity, BMI, blood pressure, and blood sugar functions. Diabetes grade {normal, pre-diabetes, diabetes} is predicted by inputting each time step, t, and characteristic Xt in LSTM ^a unit.

LSTM^a 모델의 첫번째 LSTM 단위는 환자 카테고리의 특별한 특징에 기반하여 예측할 수 있다. 하지만, 모든 특징은 분류를 위해 중요한 정보를 얻는데 사용되어야 한다. 따라서, 당뇨 특징의 순서가 이전 단위의 결과에 따라, 다음 LSTM 단위에 할당된다. 이 절차는 각 입력 특징과 함께 반복된다. 이러한 방법으로 LSTM^a 는 가치있는 특징을 구하고 출력을 생성한다. 그 출력은 당뇨병 환자 카테고리를 예측하는 소프트맥스 활성화 기능과 연결되어 있다. 당뇨병과 혈압 환자를 분류하는 의학규칙은 표 3에 제시되어 있다.The first LSTM unit of the LSTM ^a model can be predicted based on the specific characteristics of the patient category. However, all features should be used to obtain important information for classification. Thus, the order of diabetic features is assigned to the next LSTM unit, according to the results of the previous unit. This procedure is repeated with each input feature. In this way, LSTM ^a finds a valuable feature and produces an output. Its output is linked to the Softmax activation function, which predicts the diabetic category. The medical rules for classifying patients with diabetes and blood pressure are presented in Table 3.

혈압과 심전도 검사에서 웨어러블 센서는 각 환자의 BP와 심장 박수를 식별하는 데 사용된다. 환자는 표 3과 같이 수은의 밀리미터 단위 수축기 BP와 확장기 BP 값, 분당 HR 값에서 평가할 수 있다. 그러나 이러한 파라미터는 신체 활동, 수면, 온도, 스트레스, 식습관 등에 의해 영향을 받는 경우가 있다. 또한, 환자는 가족 프로필과 질병 이력 같은 다른 요인에 의해 BP와 HR 문제가 있을 수 있다. 예를 들어, 운동하는 동안 HR은 항상 높다. 이것은 환자의 비정상적인 건강 상태가 때로는 위험하지 않다는 것을 의미한다. 따라서 환자의 진료 기록과 일상 활동을 이용하여 이상 상태를 감지하는 것이 필수적이다. 진단과 진료 이력에서 연령, 성별, 가족력 등 의료기록에서 주요 특징을 선택한다. 당뇨병, BP 관련 정보를 추출하기 위해 모든 진단 및 의료 기록을 검색한다.In blood pressure and electrocardiography, wearable sensors are used to identify each patient's BP and heart rate. Patients can be evaluated from the systolic BP and diastolic BP values in millimeters of mercury, and HR per minute as shown in Table 3. However, these parameters are sometimes affected by physical activity, sleep, temperature, stress, and eating habits. Additionally, patients may have BP and HR issues due to other factors such as family profile and disease history. For example, during exercise, your HR is always high. This means that the patient's abnormal health condition is sometimes not dangerous. Therefore, it is essential to detect abnormal conditions using the patient's medical records and daily activities. Select key characteristics from medical records such as age, gender, and family history from diagnosis and treatment history. Search all diagnostic and medical records to extract information related to diabetes and BP.

추출된 정보는 특징 구성에 사용된다. 예를 들어 BP 데이터가 환자 가족 프로필에 있는 경우 1이 패밀리 특성으로 나타난다. BP에 대한 데이터가 진단 및 의료 기록에서 발견되지 않으면 해당 형상에 null 값이 사용된다. 나중에 모든 null 값은 0으로 대체된다. 특징을 생성 및 사전 처리한 후, LSTM^b 라는 이름의 Bi-LSTM 기반의 모델을 개발했는데, 이 모델은 과거 시퀀싱된 데이터에 존재하는 시간 패턴을 유지할 수 있다.The extracted information is used for feature construction. For example, if the BP data is in the patient family profile, 1 appears as a family attribute. If no data for the BP is found in the diagnostic and medical records, a null value is used for the geometry. Later, all null values are replaced with zeros. After the features were created and preprocessed, a Bi-LSTM-based model named LSTM ^b was developed, which can maintain the temporal patterns present in the sequenced data in the past.

BP의 분류를 위해 각 타임스텝 t에서 생성된 모든 파라미터와 특징을 카테고리 벡터 {0, 1, 2}(으)로 변환한다.For classification of BP, all parameters and features generated in each time step t are converted into category vectors {0, 1, 2}.

정신 건강 모니터링에서 환자 메시지 및 게시물은 소셜 네트워크에서 추출되며 지도 및 비지도 접근방식을 사용하여 필터링된다. 그러나 우울증과 스트레스와 관련된 텍스트는 대개 짧고, 비정형적이며, 부정적인 감정을 포함하고 있으며, 낮은 정서 극성이다. 따라서 텍스트 마이닝 및 ML 접근방식은 소셜 네트워크 콘텐츠를 효율적으로 처리하고 행복, 정상, 우울 또는 스트레스라는 용어로 텍스트의 정서 극성을 식별하는 데 사용된다. 여기서 먼저 감성적 문자를 식별하고, 그 다음 정서분석법을 적용하여 본문을 분류한다. 온톨로지가 있는 Word2vec 모델은 이 작품에서 심층 학습(deep learning) 분류기를 위한 텍스트를 나타내기 위해 사용된다. 200차원 Word2vec 모델을 훈련시킨 다음, LSTM^c 모델에 워드 시퀀스를 공급했다. 그런 다음 LSTM^c 의 출력은 소프트맥스 함수에 할당되며, 소프트맥스 함수는 게시된 내용의 각 문장에 대한 감성 라벨(예를 들어, 양성, 중성, 음성 또는 강한 음성)을 예측한다. 예측 극성을 기준으로 환자의 정신건강을 분류하는 규칙은 표 4에 나와 있다.In mental health monitoring, patient messages and posts are extracted from social networks and filtered using supervised and unsupervised approaches. However, texts related to depression and stress are usually short, atypical, contain negative emotions, and have low emotional polarity. Thus, text mining and ML approaches are used to efficiently process social network content and to identify the emotional polarities of texts in terms of happiness, normal, depression or stress. Here, emotional characters are first identified, and then the text is classified by applying an emotional analysis method. The Word2vec model with ontology is used in this work to represent the text for a deep learning classifier. After training a 200-dimensional Word2vec model, the word sequence was fed into the LSTM ^c model. The output of LSTM ^c is then assigned to a softmax function, which predicts a sentiment label (eg, positive, neutral, negative or strong negative) for each sentence of the posted content. The rules for classifying a patient's mental health based on predicted polarity are shown in Table 4.

카테고리category 환자 게시물 및 의견에 대한 감정적 인 극성Emotional polarity to patient posts and comments
(양성 / 중성 / 음성 / 강한 음성)(Positive / neutral / negative / strong voice) 항 당뇨병 약물에 대한 감정 극성Emotional polarity for antidiabetic drugs
(양성 / 중성 / 음성)(Positive / neutral / negative)
환자 정신 건강 분류
Patient mental health classification 행복Happiness 양성positivity -- 정상normal 중성neutrality -- 우울depressed 음성voice -- 스트레스stress 강한 음성Strong voice --
약물 부작용 예측
Predict drug side effects 부작용 없음No side effects -- 양성positivity 중간 부작용Moderate side effects -- 중성neutrality 심각한 부작용Serious side effects -- 음성voice

약물 부작용 예측에서는 현재 섭취하는 당뇨병과 BP 약물을 입력 질의로 사용하고 있으며, 다른 웹사이트에서 이에 대한 리뷰를 검색했다. 필터링과 사전 처리 후에, 리뷰는 자동적으로 부작용이 없고/적당한 부작용이 있으며/심각한 부작용이 있다고 라벨을 붙인다. 이 라벨링에 대해 각 문장의 정서 극성을 파악한다. 그런 다음, 표 4와 같이 양, 중립, 음의 정서 극성은 각각 부작용이 없고, 온건한 부작용과 심각한 부작용이 없는 것으로 간주한다. 200차원 Word2vec 모델은 LSTM^d 모델에서 일련의 단어로 약물 리뷰를 나타내도록 훈련되었다. 그런 다음 LSTM^d 의 출력을 약물 부작용을 정확하게 예측하는 소프트맥스 기능에 공급한다.In the prediction of adverse drug reactions, diabetes and BP drugs currently ingested are used as input queries, and reviews were searched on other websites. After filtering and pre-treatment, the reviews are automatically labeled as having no side effects/adequate side effects/severe side effects. For this labeling, identify the emotional polarity of each sentence. Then, as shown in Table 4, positive, neutral, and negative emotional polarities have no side effects, respectively, and are considered to have no moderate side effects and no serious side effects. The 200-dimensional Word2vec model was trained to represent drug reviews in a series of words in the LSTM ^d model. The output of LSTM ^d is then fed to the Softmax function, which accurately predicts drug side effects.

도 6은 본 발명의 일 실시예에 따른 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 이용한 의료 모니터링 방법을 설명하기 위한 흐름도이다. 6 is a flowchart illustrating a medical monitoring method using machine learning and semantic knowledge-based big data analysis according to an embodiment of the present invention.

제안하는 기계 학습 및 의미론적 지식 기반 빅데이터 분석을 이용한 의료 모니터링 방법은 웨어러블 기기 또는 센서를 통해 수집된 데이터, 환자의 의료기록, 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터, 및 의료 웹페이지의 데이터를 수집하는 단계(610), 빅데이터 클라우드 서버의 데이터 저장부를 통해 개인 클라우드 서버와 연결되어 수집된 데이터들을 저장하는 단계(620) 및 빅데이터 분석 엔진을 통해 수집된 데이터를 분석하는 단계를 포함한다. The proposed medical monitoring method using machine learning and semantic knowledge-based big data analysis includes data collected through wearable devices or sensors, medical records of patients, discussion data on social networks of patients and doctors, and medical web pages. Including the step of collecting data (610), storing the collected data by connecting to the personal cloud server through the data storage unit of the big data cloud server (620), and analyzing the collected data through the big data analysis engine. do.

단계(610)에서, 웨어러블 기기 또는 센서를 통해 수집된 데이터, 환자의 의료기록, 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터, 및 의료 웹페이지의 데이터를 수집한다. 이때, 치료와 의료 이력을 결정하기 위해 환자의 의료기록을 수집하고, 환자의 느낌, 감정, 스트레스를 파악하기 위해 SNS에서 환자의 내용을 추출하고, 현재 의약품 섭취의 부작용을 파악하기 위해 의료 웹페이지에서 약물에 대한 환자 리뷰를 수집한다. In step 610, data collected through a wearable device or sensor, medical records of patients, discussion data on social networks of patients and doctors, and data of a medical web page are collected. At this time, the patient's medical records are collected to determine the treatment and medical history, the patient's content is extracted from SNS to determine the patient's feelings, emotions, and stress, and a medical web page to identify side effects of current drug consumption. Collects patient reviews for medications at

단계(620)에서, 빅데이터 클라우드 서버의 데이터 저장부를 통해 개인 클라우드 서버와 연결되어 수집된 데이터들을 저장한다. In step 620, the collected data is stored by being connected to the personal cloud server through the data storage unit of the big data cloud server.

빅데이터 분석 엔진을 통해 수집된 데이터를 분석하는 단계에서는 빅데이터 분석 엔진을 사용한 데이터의 사전 분석(630), 텍스트 데이터에서 워드 임베딩 및 온톨로지 기반 특징 추출(640), 웨어러블 센서 데이터로부터 특성 추출(650), 주성분 분석을 통해 차원 감소에 대한 통계적 접근(660) 및 Bi-LSTM 기반 당뇨병과 BP 분류 및 약물 작용 예측(670)을 수행한다. In the step of analyzing the data collected through the big data analysis engine, the pre-analysis of data using the big data analysis engine (630), word embedding and ontology-based feature extraction from text data (640), and feature extraction from wearable sensor data (650). ), a statistical approach to dimensional reduction (660) and Bi-LSTM based diabetes and BP classification and drug action prediction (670) through principal component analysis.

이때, 빅데이터 분석 엔진을 통해 수집된 데이터의 사전 분석, 수집된 센서 데이터의 사전 처리 및 필터링, 의료기록의 사전처리, 소셜 네트워크 컨텐츠의 사전처리를 포함하는 데이터 분석을 수행하고, 특징 극성 식별 및 문서 레이블링을 수행한다. At this time, data analysis including pre-analysis of data collected through the big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network content, and feature polarity identification and Do document labeling.

더욱 상세하게는 수집된 데이터의 사전 분석을 위해 구문 분석할 수 있도록 CSV 파일로 변환하고, 숫자 값과 함께 열의 형태로 표시한 후, 센서 데이터의 ID를 실제 센서 이름으로 표시하여, 불일치와 노이즈를 제거하기 위해 데이터를 필터링한다. 그리고, 실험실 테스트, 자가 검사 답변, 복용한 약물 데이터를 포함하는 서로 다른 의료기록을 사전처리하고, 소셜 네트워크 컨텐츠에 대해 단어 제거 중지, 토큰화, PoS 태깅, 특징 변환을 포함하는 사전처리를 수행하여 데이터를 구조화된 형태로 변환한다. In more detail, the collected data is converted to a CSV file so that it can be parsed for pre-analysis, displayed in the form of a column with numeric values, and then the ID of the sensor data is displayed as the actual sensor name, so that discrepancies and noise can be resolved Filter the data to remove it. In addition, by pre-processing different medical records including laboratory tests, self-test answers, and drug data taken, and pre-processing including stop word removal, tokenization, PoS tagging, and feature conversion for social network content. Convert data into structured form.

또한, 환자들 및 의사들의 소셜 네트워크 상에서의 논의 데이터를 사용하여 정서 분석 접근법을 통해 환자의 스트레스와 우울증을 감지하고, 당뇨병 치료제의 효율성과 부작용에 대한 의견을 파악하기 위해 환자의 약물에 대한 검토를 수행한다.In addition, by using the discussion data on social networks of patients and doctors, the patient's stress and depression are detected through an emotional analysis approach, and a review of the patient's drugs is conducted to obtain opinions on the effectiveness and side effects of diabetes treatment. Perform.

단어를 숫자 값으로 나타내기 위해 워드 임베딩 접근법을 적용하고, 차원을 줄이기 위해 차원을 설정한 후, 설정된 차원 값을 사용하여 단어를 표현하고, Word2vec의 신경망 기반의 워드 임베딩 모델을 사용하여 단어를 표현한다. A word embedding approach is applied to represent a word as a numeric value, a dimension is set to reduce the dimension, and the word is expressed using the set dimension value, and the word is expressed using the word embedding model based on the neural network of Word2vec. do.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It can be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodyed in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description by those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as a system, structure, device, circuit, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다. Therefore, other implementations, other embodiments, and claims and equivalents fall within the scope of the claims to be described later.

Claims

A data collection unit that collects data collected through wearable devices or sensors, medical records of patients, discussion data on social networks of patients and doctors, and data of medical web pages;
A data storage unit of a big data cloud server connected to a personal cloud server to store collected data; And
Big data analysis that performs data analysis including pre-analysis of collected data, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network content, and identification of feature polarity and document labeling engine
Including,
Big data analysis engine,
Word embedding and ontology-based features are extracted from the text data of the stored data, features are extracted from wearable devices or sensor data, statistical approaches to dimensional reduction are performed through principal component analysis, and Bi-LSTM based diabetes and BP classification and Predicting drug side effects
Medical monitoring device.

The method of claim 1,
The data collection unit,
To determine the treatment and medical history, collect the patient's medical records, extract the patient's content from SNS to determine the patient's feelings, emotions, and stress, and use drugs on the medical web page to identify side effects of current drug consumption. To collect patient reviews for
Medical monitoring device.

The method of claim 1,
Big data analysis engine,
The collected data is converted to a CSV file so that it can be parsed for preliminary analysis, displayed in the form of a column with numeric values, and then the ID of the sensor data is displayed as the actual sensor name, and the data to remove discrepancies and noise. Filters, pre-processes different medical records, including laboratory tests, self-test answers, and drug data taken, and pre-processes including stop word removal, tokenization, PoS tagging, and feature conversion for social network content. To transform data into structured form
Medical monitoring device.

The method of claim 1,
Big data analysis engine,
Using data from discussions on social networks of patients and doctors, a sentiment analysis approach is used to detect patient stress and depression, and to conduct a review of the patient's drugs to obtain opinions on the effectiveness and side effects of diabetes treatments.
Medical monitoring device.

The method of claim 1,
Big data analysis engine,
A word embedding approach is applied to represent a word as a numeric value, a dimension is set to reduce the dimension, and the word is expressed using the set dimension value, and the word is expressed using the word embedding model based on the neural network of Word2vec. doing
Medical monitoring device.

Collecting data collected through a wearable device or sensor, medical records of patients, discussion data on social networks of patients and doctors, and data of medical web pages;
Storing the collected data by connecting to the personal cloud server through the data storage unit of the big data cloud server; And
Perform data analysis including pre-analysis of data collected through big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and identification of feature polarity and document labeling Steps to do
Including,
Perform data analysis including pre-analysis of data collected through big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and identification of feature polarity and document labeling The steps to perform are,
Word embedding and ontology-based features are extracted from text data of stored data, features are extracted from wearable devices or sensor data, statistical approaches to dimensionality reduction are performed through principal component analysis, and Bi-LSTM-based diabetes and BP classification and Predicting drug side effects
Medical monitoring method.

The method of claim 6,
Collecting data collected through wearable devices or sensors, medical records of patients, discussion data on social networks of patients and doctors, and data of medical web pages,
To determine the treatment and medical history, collect the patient's medical records, extract the patient's content from SNS to determine the patient's feelings, emotions, and stress, and use drugs on the medical web page to identify side effects of current drug consumption. To collect patient reviews for
Medical monitoring method.

The method of claim 6,
Perform data analysis including pre-analysis of data collected through big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and identification of feature polarity and document labeling The steps to perform are,
The collected data is converted to a CSV file so that it can be parsed for preliminary analysis, displayed in the form of a column with numeric values, and then the ID of the sensor data is displayed as the actual sensor name, and the data to remove discrepancies and noise. Filters, pre-processes different medical records, including laboratory tests, self-test answers, and drug data taken, and pre-processes including stop word removal, tokenization, PoS tagging, and feature conversion for social network content. To transform data into structured form
Medical monitoring method.

The method of claim 6,
Perform data analysis including pre-analysis of data collected through big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and identification of feature polarity and document labeling The steps to perform are,
Using data from discussions on social networks of patients and doctors, a sentiment analysis approach is used to detect patient stress and depression, and to conduct a review of the patient's drugs to obtain opinions on the effectiveness and side effects of diabetes treatments.
Medical monitoring method.

The method of claim 6,
Perform data analysis including pre-analysis of data collected through big data analysis engine, pre-processing and filtering of collected sensor data, pre-processing of medical records, pre-processing of social network contents, and identification of feature polarity and document labeling The steps to perform are,
A word embedding approach is applied to represent a word as a numeric value, a dimension is set to reduce the dimension, and the word is expressed using the set dimension value, and the word is expressed using the word embedding model based on the neural network of Word2vec. doing
Medical monitoring method.