KR101462748B1

KR101462748B1 - Method for clustering health-information

Info

Publication number: KR101462748B1
Application number: KR1020130002601A
Authority: KR
Inventors: 이영구; 팜더안; 홍지혜
Original assignee: 경희대학교 산학협력단
Priority date: 2013-01-09
Filing date: 2013-01-09
Publication date: 2014-11-21
Also published as: KR20140090483A

Abstract

본 발명은 정보 군집화 방법에 관한 것으로, 보다 구체적으로 건강정보에 특화되어 다수 환자의 환자별 치료 레이블이 있는 건강정보와 치료 레이블이 없는 건강정보로 이루어진 건강정보 데이터베이스를 활용하여 서로 유사한 건강정보를 군집화함으로써, 치료 레이블이 거의 존재하지 않는 통상의 건강정보 데이터베이스로부터 훈련 데이터를 생성할 수 있는 건강정보 군집화 방법에 관한 것이다.More particularly, the present invention relates to a method for grouping similar health information by using a health information database composed of health information having a treatment label for each patient and health information without a treatment label, To a health information clustering method capable of generating training data from a normal health information database in which a treatment label hardly exists.

Description

Methods for clustering health information

유비쿼터스 환경에서 사용자의 건강정보는 다양한 종류의 디바이스를 통해 수집되며 수집된 정보는 통신 네트워크를 통해 건강관리 서버의 데이터베이스에 저장된다. 사용자는 요청에 의해 또는 설정된 주기에 따라 사용자 건강정보에 대한 치료 정보를 의사, 간호사 등의 전문 자문사로부터 획득하여 사용자의 건강을 관리하게 된다.In the ubiquitous environment, the user's health information is collected through various kinds of devices, and the collected information is stored in the database of the health care server through the communication network. The user obtains treatment information on the user's health information from a professional consultant such as a doctor or a nurse according to a request or according to a set cycle, and manages the user's health.

이렇게 통신 네트워크를 통해 수집된 사용자 건강 정보는 건강관리 서버의 데이터베이스에 저장되는데, 다수의 건강관리 서버가 운영되는 경우 다수의 사용자 건강정보는 다수의 건강관리 서버에 무작위로 분포되어 저장되어 있으며 다수의 건강관리 서버의 건강정보 데이터베이스에 저정되어 있는 건강 정보에는 사용자의 상태를 나타내는 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보만이 저장될 뿐 대부분의 사용자 정보에는 사용자 건강 상태에 대한 치료 레이블 정보는 저장되어 있지 않다. 즉, 전문 자문사와 사용자의 오프라인 상담 또는 전화 상담을 통해 직접 사용자에 치료 정보를 제공하는 것이 일반적이며, 더욱이 전문 자문사는 자신의 개인적인 치료 정보를 타인에게 공개하지 않는 것이 일반적이다. The user health information collected through the communication network is stored in a database of the health care server. When a plurality of health care servers are operated, a plurality of user health information are randomly distributed in a plurality of health care servers, In the health information stored in the health information database of the health care server, only the information of the emotional label, the numerical label information, and the indication label indicating the user's state is stored. In most user information, . In other words, it is common to provide treatment information directly to the user through offline consultation or telephone consultation of the professional consultant and the user, and moreover, the professional consultant generally does not disclose his / her personal treatment information to others.

따라서 건강 관련 소셜 네트워크에서 사용자 건강 상태에 따른 치료 레이블 정보를 제공하기 위하여 훈련 데이터를 생성하고자 하는 경우, 수집한 사용자의 건강정보는 치료 레이블이 없는 건강정보가 거의 대부분을 차지하며 치료 레이블이 있는 건강정보가 거의 존재하지 않는다. 그러므로 감독기반 학습 기법에 사용될 수 있는 훈련 데이터를 생성하기 곤란하다는 문제점을 가진다.
본 발명의 선행특허로 한국공개특허 제10-2014-0064471가 있다. Therefore, if we want to generate training data to provide treatment label information according to user's health status in health related social network, the health information of collected user is almost all health information without treatment label, Information is scarcely present. Therefore, it is difficult to generate training data that can be used in supervisory-based learning techniques.
Korean Patent Laid-Open No. 10-2014-0064471 is a prior patent of the present invention.

본 발명은 위에서 언급한 문제점을 해결하기 위한 것으로, 본 발명이 이루고자 하는 목적은 치료 레이블이 거의 존재하지 않는 건강정보 데이터베이스를 활용하여 건강정보 데이터베이스를 군집화할 수 있는 방법을 제공하는 것이다.It is an object of the present invention to provide a method of clustering a health information database using a health information database in which there is almost no treatment label.

본 발명이 이루고자 하는 다른 목적은 치료 레이블이 거의 존재하지 않는 건강정보 데이터베이스를 활용하여 건강정보 데이터를 군집하고, 환자의 건강 정보와 가장 유사한 군집의 건강정보 중 치료 레이블 정보를 환자에게 제공할 수 있는 치료 조언 방법을 제공하는 것이다.It is another object of the present invention to provide a method and apparatus for collecting health information data by utilizing a health information database in which a treatment label is hardly existed and providing treatment label information among health information of a cluster most similar to the patient & And to provide a therapeutic advice method.

본 발명이 이루고자 하는 또 다른 목적은 선형 분포를 보장하지 않는 건강정보에 특화되어 건강정보의 밀접도에 기초하여 간단한 프로세스로 건강 정보를 군집화할 수 있는 방법을 제공하는 것이다. Another object of the present invention is to provide a method for clustering health information by a simple process based on the closeness of health information, which is specialized in health information that does not guarantee linear distribution.

본 발명의 목적을 달성하기 위하여 본 발명에 따른 건강정보의 군집화 방법은 다수 환자의 환자별 치료 레이블이 있는 건강정보와 치료 레이블이 없는 건강정보로 이루어진 건강정보 데이터베이스에서 치료 레이블 정보를 포함하여 레이블 정보 종류에 따라 환자 사이의 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬을 생성하는 단계와, 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬의 합으로부터 환자 사이의 전체 유사도 행렬을 생성하는 단계와, 전체 유사도 행렬에서 서로 유사한 속성을 가지는 환자 사이의 군집 밀집도가 임계값보다 작도록 밀집시켜 다수 환자의 예비 군집 정보를 생성하는 단계와, 예비 군집 정보의 분포 특성에 기초하여 설정된 수의 군집으로 다수 환자를 군집화하는 단계를 포함한다.In order to accomplish the object of the present invention, the clustering method of health information according to the present invention is a method for clustering health information according to the present invention, comprising a health information database including health information having a treatment label for each patient and health information without a treatment label, Generating a label similarity matrix representing the similarity of each label information among the patients according to the type of label information, generating a total similarity matrix between the patients from the sum of the label similarity matrices indicating the degree of similarity for each label information, The method comprising the steps of: generating preliminary population information of a plurality of patients by densifying the population density of patients having similar properties to each other such that the population density is smaller than a threshold value; and clustering a plurality of patients with a set number of clusters based on distribution characteristics of the preliminary clustering information .

여기서 환자 사이의 각 레이블 정보별 유사도는 환자별 각 레이블 정보의 유클리드 거리(euclidean distance)로 계산된다. 치료 레이블 정보의 유사도는 유클리드 거리가 0을 초과하는 경우 1로 설정된다.Here, the similarity of each label information between patients is calculated as the euclidean distance of each label information for each patient. The similarity of the treatment label information is set to 1 when the Euclidean distance exceeds zero.

한편, 레이블 정보의 종류는 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보, 치료 레이블 정보 중 적어도 어느 하나를 포함하며, 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보, 치료 레이블 정보는 각 레이블을 구성을 항목 식별자에 매칭되는 식별값으로 저장되어 있다.Meanwhile, the type of the label information includes at least one of emotion label information, numerical label information, sign label information, and treatment label information, and the emotional label information, the numeric label information, the indication label information, As an identification value matching the item identifier.

바람직하게, 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보, 치료 레이블 정보에 대한 레이블 유사도 행렬을 구성하는 각 원소의 값은 서로 동일한 제1 기준값으로 정규화된다.Preferably, values of each element constituting the label similarity matrix for the emotion label information, the numerical label information, the symptom label information, and the treatment label information are normalized to the same first reference value.

바람직하게, 예비 군집 정보를 생성하는 단계는 제2 기준값으로 정규화된 초기 벡터를 생성하는 단계(c1)와, 초기 벡터를 이전 벡터로 설정하고 전체 유사도 행렬과 이전 벡터의 곱으로부터 다음 벡터를 생성하는 단계(c2)와, 이전 벡터와 다음 벡터의 차가 임계값보다 작은지 판단하는 단계(c3)와, 이전 벡터와 다음 벡터의 차가 임계값보다 작은 경우 다음 벡터에 기초하여 다수 환자의 예비 군집 정보를 생성하는 단계(c4)를 포함하며, 이전 벡터와 다음 벡터의 차가 임계값보다 큰 경우 다음 벡터를 이전 벡터로 설정하여 (c2) 내지 (c3)를 반복 수행하는 것을 특징으로 한다. Preferably, the step of generating the preliminary cluster information comprises the steps of (c1) generating an initial vector normalized to a second reference value, generating a next vector from the product of the global similarity matrix and the previous vector, (C3) determining whether the difference between the previous vector and the next vector is smaller than the threshold value, and determining whether the difference between the previous vector and the next vector is smaller than the threshold value, (C4). If the difference between the previous vector and the next vector is larger than the threshold value, the next vector is set as the previous vector, and the steps (c2) to (c3) are repeatedly performed.

한편, 예비 군집 정보는 K-means 군집화 알고리즘을 통해 예비 군집 정보의 분포 밀집도 특성에 따라 설정된 수의 군집으로 군집화된다.On the other hand, the preliminary cluster information is clustered into a set number of clusters according to the distribution density characteristics of the preliminary cluster information through the K-means clustering algorithm.

한편, 본 발명에 따른 치료 조언 방법은 다수 환자의 환자별 치료 레이블이 있는 건강정보와 치료 레이블이 없는 건강정보로 이루어진 건강정보 데이터베이스에서 치료 레이블 정보를 포함하여 레이블 정보 종류에 따라 환자 사이의 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬을 생성하는 단계와, 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬의 합으로부터 환자 사이의 전체 유사도 행렬을 생성하는 단계와, 전체 유사도 행렬에서 서로 유사한 속성을 가지는 환자 사이의 건강정보를 밀집시켜 다수의 환자의 예비 군집 정보를 생성하고 예비 군집 정보의 분포 밀집도 특성에 기초하여 설정된 수로 다수 환자를 군집화하는 단계와, 군집화된 환자의 레이블 정보 종류별 건강정보의 중심값으로 각 군집별 군집 정보를 생성하는 단계와, 신규 환자의 건강정보와 군집 정보의 유사도를 판단하여 신규 환자와 가장 유사도가 높은 군집 정보의 치료 레이블 정보를 신규 환자에 대한 치료 레이블 정보로 제공하는 단계를 포함한다.In the meantime, the treatment advice method according to the present invention includes treatment label information in a health information database made up of health information having a treatment label for each patient and health information without a treatment label, Generating a label similarity matrix representing information similarity by information; generating a total similarity matrix between the patients from the sum of label similarity matrices indicating similarity per label information; Clustering health information of a plurality of patients and clustering a plurality of patients with a predetermined number based on distribution density characteristics of the preliminary clustering information, The cluster that generates cluster information And providing the treatment label information of the cluster information having the highest degree of similarity to the new patient as the treatment label information for the new patient by determining the similarity between the health information and the cluster information of the new patient.

한편, 본 발명의 목적을 달성하기 위하여 본 발명에 따른 건강정보의 군집화 장치는 다수 환자의 환자별 치료 레이블이 있는 건강정보와 치료 레이블이 없는 건강정보로 이루어진 건강정보 데이터베이스에서 환자 사이의 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬을 생성하는 레이블 유사도 행렬 생성부와, 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬의 합으로부터 환자 사이의 전체 유사도 행렬을 생성하는 전체 유사도 행렬 생성부와, 전체 유사도 행렬에서 서로 유사한 속성을 가지는 환자 사이의 군집 밀집도가 임계값보다 작도록 밀집시켜 다수 환자의 예비 군집 정보를 생성하는 예비 군집 정보 생성부와, 예비 군집 정보의 분포 밀집도 특성에 기초하여 설정된 수의 군집으로 다수 환자를 군집화하는 군집화부를 포함한다.In order to achieve the object of the present invention, the apparatus for clustering health information according to the present invention comprises a health information database including health information with a treatment label for each patient and health information without a treatment label, A similarity degree matrix generating unit for generating a label similarity matrix showing similarity degrees of stars, a total similarity matrix generating unit for generating a total similarity matrix between patients from the sum of label similarity matrices indicating similarity degrees for respective label information, A preliminary cluster information generation unit for generating preliminary population information of a plurality of patients by densifying the population density between patients having similar properties to each other such that the population density is smaller than a threshold value, and a plurality of sets of population based on the distribution density characteristics of the preliminary population information Includes clustering to clustering patients The.

여기서 레이블 유사도 행렬 생성부는 환자 사이의 감정 레이블 정보의 유사도를 계산하고 환자 사이의 감정 레이블 유사도 행렬을 생성하는 감정 행렬 생성부와, 환자 사이의 수치 레이블 정보의 유사도를 계산하고 환자 사이의 수치 레이블 유사도 행렬을 생성하는 수치 행렬 생성부와, 환자 사이의 징후 레이블 정보의 유사도를 계산하고 환자 사이의 징후 레이블 유사도 행렬을 생성하는 징후 행렬 생성부와, 환자 사이의 치료 레이블 정보의 유사도를 계산하고 환자 사이의 치료 레이블 유사도 행렬을 생성하는 치료 행렬 생성부를 포함한다.Here, the label similarity matrix generator includes an emotion matrix generator for calculating the similarity of emotional label information between patients and generating an emotional label similarity matrix between the patients, a similarity degree calculator for calculating the similarity of the numeric label information between the patients, A sign matrix generator for calculating the similarity degree of the label label information between the patients and generating a sign label similarity matrix between the patients, a calculation matrix generator for calculating the similarity of the treatment label information between the patients, And generating a treatment label similarity matrix of the treatment label.

여기서 레이블 유사도 행렬 생성부는 감정 레이블 행렬, 수치 레이블 행렬, 징후 레이블 행렬, 치료 레이블 행렬을 동일한 제1 기준값으로 정규화하는 행렬 정규화부를 더 포함한다.Here, the label similarity matrix generating unit may further include a matrix normalizing unit that normalizes the emotional label matrix, the numeric label matrix, the symptom label matrix, and the treatment label matrix to the same first reference value.

여기서 예비 군집 정보 생성부는 제2 기준값으로 정규화된 초기 벡터를 생성하는 초기 벡터 생성부와, 초기 벡터를 이전 벡터로 설정하고 이전 벡터와 전체 유사도 행렬을 곱하여 다음 벡터를 생성하는 다음 벡터 생성부와, 이전 벡터와 다음 벡터의 차가 임계값보다 작은지 판단하며 판단 결과에 기초하여 이전 벡터와 다음 벡터의 차가 임계값보다 큰 경우 다음 벡터를 이전벡터로 설정하여 다음 벡터를 반복 생성하도록 제어하는 밀집부와, 밀집부의 판단 결과에 기초하여 이전 벡터와 다음 벡터의 차가 임계값보다 작은 경우 다음 벡터에 기초하여 설정된 수로 다수 환자의 예비 군집 정보를 생성하는 예비 군집 생성부를 포함한다.Herein, the preliminary cluster information generator includes an initial vector generator for generating an initial vector normalized to a second reference value, a next vector generator for setting the initial vector as a previous vector, multiplying the previous vector and the global similarity matrix to generate a next vector, Determining whether the difference between the previous vector and the next vector is smaller than the threshold value and if the difference between the previous vector and the next vector is greater than the threshold value based on the determination result, And a spare cluster generator for generating spare cluster information of a plurality of patients based on the result of the determination by the dense part, based on the next vector when the difference between the previous vector and the next vector is smaller than the threshold value.

본 발명에 따른 건강정보의 군집화 방법은 종래 군집화 방법과 비교하여 다음과 같은 다양한 효과를 가진다.The health information clustering method according to the present invention has various effects as compared with the conventional clustering method as follows.

첫째, 본 발명에 따른 건강정보의 군집화 방법은 치료 레이블 정보와 치료 레이블 정보 이외의 다른 건강 레이블 정보 각각에 대한 환자 사이의 유사도를 나타내는 레이블 유사도 행렬로부터 전체 유사도 행렬을 생성하고, 전체 유사도 행렬을 이용하여 예비 군집 정보를 생성함으로써, 치료 레이블이 거의 존재하지 않는 건강정보 데이터베이스를 이용하더라도 건강정보를 군집화할 수 있다.First, in the clustering method of health information according to the present invention, a total similarity matrix is generated from a label similarity matrix showing similarity between patients for each of health label information other than the treatment label information and the treatment label information, and a total similarity matrix is used , The health information can be clustered even if a health information database in which the treatment label is almost absent is used.

둘째, 본 발명에 따른 건강정보의 군집화 방법은 치료 레이블이 거의 존재하지 않는 건강정보 데이터베이스를 활용하여 건강정보 데이터를 군집화함으로써, 환자의 건강 정보와 가장 유사한 군집의 건강정보의 치료 레이블 정보를 환자에게 제공할 수 있다.Second, the clustering method of health information according to the present invention clusters health information data by using a health information database in which there is almost no treatment label, thereby obtaining treatment label information of health information of the clusters most similar to the patient's health information .

셋째, 본 발명에 따른 건강정보의 군집화 방법은 선형 분포를 보장하지 않는 건강정보에 특화되어 건강정보의 밀접도에 기초하여 치료 레이블 정보가 거의 존재하지 않는 건강 정보를 간단한 프로세스로 군집화할 수 있다.Third, the clustering method of health information according to the present invention can cluster the health information, which does not guarantee the linear distribution, on health information, which is specialized in health information, based on the closeness of health information, and which has almost no treatment label information.

도 1은 본 발명에 따른 건강정보 군집화 장치를 설명하기 위한 기능 블록도이다.
도 2는 본 발명에 따른 레이블 유사도 행렬 생성부를 보다 구체적으로 설명하기 위한 기능 블록도이다.
도 3은 본 발명에 따른 예비 군집 정보 생성부를 보다 구체적으로 설명하기 위한 기능 블록도이다.
도 4는 본 발명에 따른 건강정보 군집화 장치를 이용하여 생성한 건강정보의 훈련 데이터를 이용하여 새로운 환자에 치료 레이블 정보를 제공하는 치료 조언 시스템을 설명하기 위한 기능 블록도이다.
도 5는 본 발명에 따른 건강정보의 군집화 방법을 설명하기 위한 흐름도이다.
도 6은 건강정보 데이터베이스에 저장되어 있는 건강 레이블 정보의 일 예를 설명하기 위한 도면이다.
도 7은 본 발명에 따른 건강정보의 군집화 방법에서 레이블 유사도 행렬을 생성하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.
도 8은 본 발명에 따른 건강정보 군집화 방법에서 예비 군집 정보를 생성하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.
도 9는 본 발명에 따른 치료 조언 방법을 설명하기 위한 흐름도이다.
도 10은 K-means 군집화 알고리즘의 일 예를 설명하기 위한 도면이다.1 is a functional block diagram for explaining a health information clustering apparatus according to the present invention.
FIG. 2 is a functional block diagram for explaining the label similarity matrix generator according to the present invention in more detail.
3 is a functional block diagram for explaining the spare cluster information generating unit according to the present invention in more detail.
4 is a functional block diagram for explaining a treatment advice system that provides treatment label information to a new patient using training data of health information generated using the health information clustering apparatus according to the present invention.
5 is a flowchart illustrating a method of clustering health information according to the present invention.
6 is a view for explaining an example of health label information stored in the health information database.
FIG. 7 is a flowchart for explaining a step of generating a label similarity matrix in the health information clustering method according to the present invention.
FIG. 8 is a flowchart illustrating a method for generating reserve cluster information in the health information clustering method according to the present invention.
FIG. 9 is a flowchart for explaining a treatment advice method according to the present invention.
10 is a diagram for explaining an example of a K-means clustering algorithm.

이하 첨부한 도면을 참고로 본 발명에 따른 건강정보 군집화 방법 및 그 장치에 대해 보다 구체적으로 설명한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a health information clustering method and apparatus according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 건강정보 군집화 장치를 설명하기 위한 기능 블록도이다.1 is a functional block diagram for explaining a health information clustering apparatus according to the present invention.

도 1을 참고로 살펴보면, 레이블 유사도 행렬 생성부(120)는 건강정보 데이터베이스(110)에 저장되어 있는 다수 환자의 건강정보로부터 각 건강 레이블 정보별 다수 환자 사이의 유사도를 나타내는 레이블 유사도 행렬을 생성한다. 건강정보 데이터베이스(110)에 저장되어 있는 건강 정보는 환자의 치료 레이블이 존재하는 건강정보와 환자의 치료 레이블이 존재하지 않는 건강정보가 혼합되어 저장되어 있다. 여기서 레이블 정보의 종류는 환자의 감정 상태를 나타내는 감정 레이블 정보, 환자의 건강 수치 상태를 나타내는 수치 레이블 정보, 환자의 징후 상태를 나타내는 징후 레이블 정보 및 환자의 건강 상태에 따라 처방된 치료 레이블 정보를 포함한다. 통상적으로 다수 환자 중 일부 환자는 모든 종류의 레이블 정보를 구비하는 반면 다른 일부 환자는 치료 레이블 정보를 구비하지 않거나 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보 중 어느 하나를 구비하지 않고 저장될 수 있다. Referring to FIG. 1, the label similarity matrix generator 120 generates a label similarity matrix representing similarities among a plurality of patients according to each health label information from health information of a plurality of patients stored in the health information database 110 . The health information stored in the health information database 110 is a mixture of health information in which the patient's treatment label exists and health information in which the patient's treatment label does not exist. Here, the type of the label information includes the emotional label information indicating the emotional state of the patient, the numerical label information indicating the health state of the patient, the indication label information indicating the symptom state of the patient, and the treatment label information prescribed according to the health state of the patient do. Typically, some of the multiple patients have all kinds of label information, while some others do not have the treatment label information or may be stored without any of the emotional label information, numerical label information, or symptom label information .

본 발명이 적용되는 분야에 따라 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보 및 치료 레이블 정보 이외의 추가적인 건강 레이블 정보가 건강정보 데이터베이스(110)에 저장되어 있거나 감정 레이블 정보, 수치 레이블 정보, 징후 레이블 정보 및 치료 레이블 정보와 다른 건강 레이블 정보가 건강정보 데이터베이스(110)에 저장될 수 있으며, 레이블 유사도 행렬 생성부(110)는 건강정보 데이터베이스(110)에 저장되어 있는 건강 레이블 정보의 종류에 따라 레이블 유사도 행렬을 생성한다.According to the field to which the present invention is applied, additional health label information other than the emotion label information, the numerical label information, the indication label information, and the treatment label information is stored in the health information database 110 or the emotional label information, the numeric label information, Information label information and health label information other than health label information may be stored in the health information database 110. The label similarity matrix generator 110 may generate a label similarity matrix based on the type of the health label information stored in the health information database 110 Thereby generating a similarity matrix.

전체 유사도 행렬 생성부(130)는 다수 환자 사이의 각 건강 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬을 서로 합하여 환자 사이의 전체 유사도 행렬을 생성한다. 본 발명이 적용되는 분야에 따라 각 레이블 유사도 행렬은 서로 동일한 가중치로 합해지거나 서로 다른 가중치로 합해질 수 있으며 이는 본 발명의 범위에 속한다.The total similarity matrix generator 130 combines the label similarity matrices representing the similarity of each health label information among a plurality of patients to generate a total similarity matrix between the patients. According to the field to which the present invention is applied, each label similarity matrix may be added to the same weight or may be combined with different weights, which falls within the scope of the present invention.

예비 군집 정보 생성부(140)는 생성한 전체 유사도 행렬을 이용하여 서로 유사한 속성을 가지는 환자 사이의 군집 밀집도를 반복적으로 증가시키며 이전 단계에서의 군집 밀집도와 현재 단계에서의 군집 밀집도의 차이가 임계값 이하가 되는 경우 즉, 환자 사이의 군집 밀집도가 임계값보다 작은 경우 다수 환자의 예비 군집 정보를 생성한다. 군집화부(150)는 생성한 예비 군집 정보의 분포 밀집도 특성에 기초하여 설정된 수의 군집으로 다수 환자를 군집화한다.
The reserve cluster information generation unit 140 repeatedly increases the cluster density among patients having similar properties using the generated similarity degree matrix, and the difference between the cluster density at the previous step and the cluster density at the current step is larger than the threshold value The preliminary cluster information of a plurality of patients is generated when the population density of the patients is smaller than the threshold value. The clustering unit 150 clusters a plurality of patients in a set number of clusters based on the distribution density characteristics of the generated preliminary clustering information.

도 2는 본 발명에 따른 레이블 유사도 행렬 생성부를 보다 구체적으로 설명하기 위한 기능 블록도이다.FIG. 2 is a functional block diagram for explaining the label similarity matrix generator according to the present invention in more detail.

도 2를 참고로 살펴보면, 감정 행렬 생성부(111)는 건강정보 데이터베이스(110)에 저장되어 있는 다수 환자의 감정 레이블 정보 사이의 유사도를 계산하고, 계산한 다수 환자의 감정 정보 사이의 유사도를 원소로 하는 감정 레이블 유사도 행렬을 생성한다. 수치 행렬 생성부(113)는 건강정보 데이터베이스(110)에 저장되어 있는 다수 환자의 수치 레이블 정보 사이의 유사도를 계산하고, 계산한 다수 환자의 수치 레이블 정보 사이의 유사도를 원소로 하는 수치 레이블 유사도 행렬을 생성한다. 징후 행렬 생성부(115)는 건강정보 데이터베이스(110)에 저장되어 있는 다수 환자의 징후 레이블 정보 사이의 유사도를 계산하고, 계산한 다수 환자의 징후 레이블 정보 사이의 유사도를 원소로 하는 징후 레이블 유사도 행렬을 생성한다. 치료 행렬 생성부(117)는 건강정보 데이터베이스에 저장되어 있는 다수 환자의 치료 레이블 정보 사이의 유사도를 계산하고, 계산한 다수 환자의 치료 레이블 정보 사이의 유사도를 원소로 하는 치료 레이블 유사도 행렬을 생성한다.Referring to FIG. 2, the emotion matrix generator 111 calculates the similarity between emotion label information of a plurality of patients stored in the health information database 110, and calculates the similarity between emotion information of the plurality of patients, To generate an emotion label similarity matrix. The numerical matrix generation unit 113 calculates the similarity between the numeric label information of a plurality of patients stored in the health information database 110 and generates a numeric label similarity matrix . The symptom matrix generation unit 115 calculates the similarity between the symptom label information of a plurality of patients stored in the health information database 110 and generates a symptom label similarity matrix using the similarity between the calculated symptom label information . The treatment matrix generation unit 117 calculates the similarity between the treatment label information of a plurality of patients stored in the health information database and generates a treatment label similarity matrix having the similarity between the calculated treatment label information of the plurality of patients as elements .

바람직하게, 감정 정보 사이의 유사도, 수치 정보 사이의 유사도, 징후 정보 사이의 유사도, 치료 정보 사이의 유사도는 유클리드 거리(euclidean distance)를 이용하여 계산되는데, 각 환자의 감정 정보를 구성하는 각 항목별 식별값의 차이로 계산된다. Preferably, the similarity between the emotion information, the similarity between the numerical information, the similarity between the symptom information, and the similarity between the therapeutic information are calculated using the euclidean distance, and each item constituting the emotion information of each patient It is calculated as the difference of identification value.

행렬 정규화부(119)는 감정 레이블 유사도 행렬을 구성하는 원소값, 수치 레이블 유사도 행렬을 구성하는 원소값, 징후 레이블 유사도 행렬을 구성하는 원소값을 동일한 기준값으로 정규화한다. 예를 들어, 감정 레이블 유사도 행렬을 구성하는 원소값, 수치 레이블 유사도 행렬을 구성하는 원소값, 징후 레이블 유사도 행렬을 구성하는 원소값은 0에서 1 사이의 값을 가지도록 정규화된다.
The matrix normalization unit 119 normalizes the element values constituting the emotion label similarity matrix, the element values constituting the numeric label similarity matrix, and the element values constituting the label label similarity matrix to the same reference value. For example, the element values constituting the emotion label similarity matrix, the element values constituting the numeric label similarity matrix, and the element values constituting the label label similarity matrix are normalized to have a value between 0 and 1.

도 3은 본 발명에 따른 예비 군집 정보 생성부를 보다 구체적으로 설명하기 위한 기능 블록도이다.3 is a functional block diagram for explaining the spare cluster information generating unit according to the present invention in more detail.

도 3을 참고로 살펴보면, 초기 벡터 생성부(141)는 전체 유사도 행렬이 n×n크기인 경우 1×n 크기의 초기 벡터(v₀)를 생성한다. 초기 벡터 생성부(141)는 1에서 9의 값을 가지는 임의의 값으로 설정된 후 다시 0에서 1의 값을 가지는 값으로 정규화되거나, 전체 유사도 행렬의 각 행의 합을 전체 행렬의 합으로 나누어 정규화하여 생성된다. Referring to FIG. 3, the initial vector generation unit 141 generates a 1 × n initial vector (v ₀ ) when the total similarity matrix is n × n. The initial vector generation unit 141 may be set to an arbitrary value having a value from 1 to 9, and then normalized to a value having a value of 0 to 1. Alternatively, the initial vector generation unit 141 may normalize the sum of each row of the total similarity matrix, .

다음 벡터 생성부(143)는 초기 벡터를 이전 벡터(v)로 설정하고 이전 벡터와 전체 유사도 행렬을 곱하여 다음 벡터(v')를 생성한다. 밀집부(145)는 차 계산부(145-1)와 밀집 제어부(145-3)를 구비하는데, 차 계산부(145-1)는 다음 벡터(v')와 이전 벡터(v)를 차감하여 차를 계산하며 밀집 제어부(145-3)는 차 계산부(145-1)에서 계산한 이전 벡터와 다음 벡터의 차가 임계값보다 작은지 판단하며, 판단 결과에 기초하여 이전 벡터와 다음 벡터의 차가 임계값보다 큰 경우 다음 벡터 생성부(143)를 통해 다음 벡터를 이전 벡터로 설정하여 다음 벡터를 반복 생성하도록 제어한다. 한편, 예비 군집 결정부(147)는 밀집 제어부(145-3)의 판단 결과에 기초하여 이전 벡터와 다음 벡터의 차가 임계값보다 작은 경우, 다음 벡터를 다수 환자의 예비 군집 정보로 결정한다.
The next vector generating unit 143 sets the initial vector to the previous vector v and multiplies the previous vector by the global similarity matrix to generate the next vector v '. The dense section 145 includes a difference calculator 145-1 and a dense controller 145-3. The difference calculator 145-1 subtracts the next vector v 'and the previous vector v The density control section 145-3 determines whether the difference between the previous vector and the next vector calculated by the difference calculation section 145-1 is smaller than the threshold value and determines whether the difference between the previous vector and the next vector If it is larger than the threshold value, the next vector generating unit 143 sets the next vector as the previous vector, and controls the next vector to be repeatedly generated. On the other hand, if the difference between the previous vector and the next vector is smaller than the threshold value, the preliminary cluster determination unit 147 determines the next vector as the preliminary cluster information of a plurality of patients based on the determination result of the density control unit 145-3.

도 4는 본 발명에 따른 건강정보 군집화 장치를 이용하여 생성한 건강정보의 훈련 데이터를 이용하여 새로운 환자에 치료 레이블 정보를 제공하는 치료 조언 시스템을 설명하기 위한 기능 블록도이다.4 is a functional block diagram for explaining a treatment advice system that provides treatment label information to a new patient using training data of health information generated using the health information clustering apparatus according to the present invention.

도 4를 참고로 보다 구체적으로 살펴보면, 치료 조언 시스템은 유선/무선 네트워크(300)에 접속되어 있는 건강 정보 군집화 장치(100), 치료 조언 장치(200) 및 사용자 단말기(400)를 구비하고 있다.4, the treatment advice system includes a health information clustering apparatus 100, a treatment advice apparatus 200, and a user terminal 400, which are connected to a wired / wireless network 300. As shown in FIG.

건강 정보 군집화 장치(100)는 앞서 도 1 내지 도 3을 참고로 설명한 바와 같이 치료 레이블 정보를 일부만 구비하는 건강정보 데이터베이스를 활용하여 유사한 환자들을 군집화하여 유사한 건강상태를 가지는 환자를 군집화한 건강정보 훈련 데이터를 생성한다. 치료 조언 장치(200)는 새로운 환자가 소지하는 사용자 단말기(400)를 통해 새로운 환자의 건강정보, 즉 감정 레이블 정보, 수치 레이블 정보 및 징후 레이블 정보를 네트워크(300)를 통해 수신하는 경우, 수신한 새로운 환자의 건강정보와 가장 유사한 건강정보를 가지는 군집을 판단하고, 가장 유사한 건강정보를 가지는 군집의 치료 레이블 정보를 네트워크(300)를 통해 사용자 단말기(400)로 제공한다.1 to 3, the health information clustering apparatus 100 clusters similar patients using a health information database including only a part of the treatment label information, And generates data. The therapy advice device 200 receives the health information of the new patient through the user terminal 400 carried by the new patient, that is, the emotion label information, the numeric label information, and the symptom label information through the network 300, A cluster having health information most similar to the health information of the new patient is determined and the treatment label information of the cluster having the most similar health information is provided to the user terminal 400 through the network 300. [

본 발명이 적용되는 분야에 따라 치료 조언 장치(200)는 별도의 사용자 단말기(400)를 이용하여 새로운 환자의 건강 정보를 수신하는 대신, 사용자 인터페이스부(미도시)를 더 구비하며 사용자 인터페이스부를 통해 새로운 환자의 건강 정보를 입력받을 수 있다.
According to the field to which the present invention is applied, instead of receiving health information of a new patient by using a separate user terminal 400, the therapy advice device 200 further includes a user interface unit (not shown) New patient's health information can be input.

도 5는 본 발명에 따른 건강정보의 군집화 방법을 설명하기 위한 흐름도이다.5 is a flowchart illustrating a method of clustering health information according to the present invention.

도 5를 참고로 살펴보면, 다수 환자의 환자별 치료 레이블이 있는 건강정보와 치료 레이블이 없는 건강정보로 이루어진 건강정보 데이터베이스에서 치료 레이블 정보를 포함하여 건강 레이블 정보 종류에 따라 환자 사이의 각 건강 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬을 생성한다(S100). 도 6은 건강정보 데이터베이스에 저장되어 있는 건강 레이블 정보의 일 예를 설명하기 위한 도면이다. 도 6(a)는 환자별 감정 레이블 정보의 일 예를 도시하고 있으며, 도 6(b)는 환자별 수치 레이블 정보의 일 예를 도시하고 있으며, 도 6(c)는 환자별 징후 레이블 정보의 일 예를 도시하고 있으며, 도 6(d)는 환자별 치료 레이블 정보의 일 예를 도시하고 있다. 바람직하게, 각 건강 레이블 정보에는 각 건강 레이블 정보를 구성하는 다수의 항목이 존재하며, 각 항목의 수치는 해당하는 레벨의 식별값으로 매칭되어 있다.Referring to FIG. 5, in the health information database including the health information having the treatment label for each patient and the health information without the treatment label, the health label information And generates a label similarity matrix representing the similarity of stars (S100). 6 is a view for explaining an example of health label information stored in the health information database. 6B shows an example of the numerical label information for each patient. FIG. 6C shows an example of the label information of each patient. FIG. 6A shows an example of the emotional label information for each patient, And FIG. 6 (d) shows an example of treatment label information for each patient. Preferably, each health label information includes a plurality of items constituting each health label information, and the numerical values of the respective items are matched with identification values of the corresponding level.

다시 도 5를 참고로 살펴보면, 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬에 가중치를 곱하고 가중치가 곱해진 각 레이블 유사도 행렬을 서로 합하여 환자 사이의 전체 유사도 행렬을 생성한다(S200).Referring to FIG. 5 again, a label similarity matrix representing similarity for each label information is multiplied by a weight, and each label similarity matrix multiplied by the weight is added to generate a total similarity matrix between the patients (S200).

전체 유사도 행렬에서 서로 유사한 속성을 가지는 환자 사이의 군집 밀집도가 임계값보다 작도록 밀집시켜 다수 환자의 예비 군집 정보를 생성하며(S300), 예비 군집 정보의 분포 밀집도 특성에 기초하여 설정된 수의 군집으로 다수 환자를 군집화한다(S400). 바람직하게, 예비 군집 정보는 K-means 군집화 알고리즘을 통해 예비 군집 정보의 분포 밀집도 특성에 따라 설정된 수의 군집으로 군집화된다.(S300), and the number of clusters based on the distribution density characteristics of the preliminary clustering information is set to a predetermined number of clusters Multiple patients are clustered (S400). Preferably, the preliminary clustering information is clustered into a set number of clusters according to the distribution density characteristics of the preliminary clustering information through the K-means clustering algorithm.

도 10은 K-means 군집화 알고리즘의 일 예를 설명하기 위한 도면으로, 도 10(a)와 같이 다수의 데이터가 정보가 존재하는 경우, 다수의 데이터를 2개의 군집으로 군집화하는 경우 군집화하고자 하는 수만큼 임의의 데이터를 선택한다. 도 10(b)와 같이 데이터 1과 2를 선택하는 경우, 나머지 데이터3, 4를 데이터1과 데이터 2 중 근접하는 데이터와 예비 군집을 형성한다. 도 10(c)와 같이 형성한 예비 군집을 구성하는 데이터의 평균으로 중심점을 계산하고 데이터 1 내지 4를 계산한 중심점(c1, c2)에 중 근접하는 데이터로 군집을 새로 형성한다. 도 10(d)와 같이 새로 형성한 군집을 구성하는 데이터의 평균으로 새로운 중심점(c3, c4)을 계산하고 데이터 1 내지 4를 새로운 중심점(c3, c4) 중 근접하는 데이터로 군집을 새로 형성한다. 새로 형성한 군집이 이전 형성한 군집과 동일한 경우 군집화가 완료된다.
FIG. 10 is a diagram for explaining an example of a K-means clustering algorithm. As shown in FIG. 10 (a), when a plurality of pieces of data exist in a cluster and a plurality of data is clustered into two clusters, As shown in FIG. When data 1 and 2 are selected as shown in FIG. 10 (b), the remaining data 3 and 4 are formed as data 1 and data 2, and a preliminary cluster. The center point is calculated as the average of the data constituting the spare cluster formed as shown in FIG. 10 (c), and a cluster is newly formed with the data close to the center points (c1, c2) calculated from the data 1 to 4. New center points c3 and c4 are calculated as an average of the data constituting the newly formed cluster as shown in FIG. 10 (d), and data 1 to 4 are newly formed as data adjacent to the new center points c3 and c4 . Clustering is completed when the newly formed cluster is the same as the previously formed cluster.

도 7은 본 발명에 따른 건강정보의 군집화 방법에서 레이블 유사도 행렬을 생성하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.FIG. 7 is a flowchart for explaining a step of generating a label similarity matrix in the health information clustering method according to the present invention.

도 7을 참고로 보다 구체적으로 살펴보면, 다수 환자의 다양한 건강 레이블 정보를 구비하는 건강정보 데이터베이스에서 동일한 종류의 건강 레이블 정보를 추출하여 환자 사이의 건강 레이블 정보의 유클리드 거리를 계산하고, 계산한 유클리드 거리로부터 환자 사이의 건강 레이블 정보의 유사도를 계산한다(S110). 7, the same kind of health label information is extracted from a health information database having various health label information of a plurality of patients to calculate the Euclidean distance of the health label information between the patients, and the calculated Euclidean distance The similarity of the health label information between the patients is calculated (S110).

예를 들어 설명하면, 건강정보 데이터베이스에 환자1 내지 환자5의 건강 레이블 정보가 저장되어 있는 경우, 환자1과 환자2 사이, 환자1과 환자3 상이, 환자1과 환자4 사이, 환자1과 환자5사이의 감정 레이블 정보 중 동일한 항목에 대한 차로부터 유클리드 거리를 계산한다. 건강 레이블 정보(S_k)에 대한 환자(i)와 환자(j) 사이의 유클리드 거리는 아래의 수학식(1)에 의해 계산되는데,For example, if the health information information of Patient 1 to Patient 5 is stored in the health information database, it is assumed that Patient 1 and Patient 2, Patient 1 and Patient 3, Patient 1 and Patient 4, Patient 1 and Patient 5 The Euclidean distance is calculated from the difference for the same item among the emotional label information between < RTI ID = 0.0 > 5 < / RTI > The Euclidean distance between the patient (i) and the patient (j) for the health label information (S _k ) is calculated by the following equation (1)

[수학식 1][Equation 1]

여기서 a, b, ...., j는 건강 레이블 정보(S_k)를 구성하는 환자(i)와 환자(j)의 항목별 식별값이다. Here, a, b, ...., j are identification values of the patient (i) and the patient (j) constituting the health label information (S _k ).

유클리드 거리로부터 계산한 환자 사이의 건강 레이블 정보의 유사도로부터 각 건강 레이블 정보의 유사도를 나타내는 레이블 유사도 행렬을 생성한다(S120). 여기서 치료 레이블 유사도 행렬을 구성하는 원소들의 값은 0, 1의 값을 가지는데, 환자 사이의 치료 정보 유사도가 일치하는 경우 유클리드 거리가 0을 가지며 원소값은 동일하게 0의 값을 가진다. 한편, 환자 사이의 치료 정보 유사도가 일치하지 않는 경우 유클리드 거리는 0을 초과하며 원소값은 1로 설정된다.A label similarity matrix representing the similarity of each health label information is generated from the similarity of health label information between patients calculated from the Euclidean distance (S120). Here, the values of the elements constituting the treatment label similarity matrix have values of 0 and 1. When the similarity of treatment information between patients is coincident, the Euclidean distance has a value of 0 and the element value has a value of 0 equally. On the other hand, the Euclidean distance exceeds 0 and the element value is set to 1 when the similarity of treatment information between patients does not coincide.

예를 들어, 환자1 내지 환자 5의 감정 레이블 유사도 행렬, 수치 레이블 유사도 행렬, 징후 레이블 유사도 행렬 및 치료 레이블 유사도 행렬의 일 예는 아래의 수학식(2) 내지 수학식(5)와 같다. For example, one example of the emotion label similarity matrix, the numeric label similarity matrix, the symptom label similarity matrix, and the treatment label similarity matrix of the patients 1 to 5 is shown in the following equations (2) to (5).

[수학식 2]&Quot; (2) "

[수학식 3]&Quot; (3) "

[수학식 4]&Quot; (4) "

[수학식 5]&Quot; (5) "

생성한 각 건강 레이블 정보에 대한 레이블 유사도 행렬를 동일한 기준값으로 정규화한다(S130). 바람직하게, 여기서 기준값은 0과 1인데 각 건강 레이블 정보에 대한 레이블 유사도 행렬의 원소는 0에서 1 사이의 값을 가지도록 정규화된다.
The label similarity matrix for each generated health label information is normalized to the same reference value (S130). Preferably, the reference values are 0 and 1, and the elements of the label similarity matrix for each health label information are normalized to have a value between 0 and 1.

도 8은 본 발명에 따른 건강정보 군집화 방법에서 예비 군집 정보를 생성하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.FIG. 8 is a flowchart illustrating a method for generating reserve cluster information in the health information clustering method according to the present invention.

도 8을 참고로 보다 구체적으로 살펴보면, 전체 유사도 행렬의 크기가 n×n인 경우 1×n의 크기를 가지는 초기 벡터를 생성하고(S310), 생성한 초기 벡터를 기준값으로 정규화한다(S320). 바람직하게, 초기 벡터는 임의값으로 설정된 후 기준값, 예를 들어 0에서 1 사이의 값을 가지도록 정규화된다.8, when the total similarity matrix is n × n, an initial vector having a size of 1 × n is generated (S310), and the generated initial vector is normalized to a reference value (S320). Preferably, the initial vector is set to an arbitrary value and then normalized to have a reference value, for example, a value between 0 and 1.

초기 벡터를 이전 벡터로 설정하고 전체 유사도 행렬과 이전 벡터를 곱하여 다음 벡터를 생성하며(S330), 생성한 다음 벡터와 이전 벡터를 차감하여 다음 벡터와 이전 벡터의 차가 설정한 임계값보다 작은지 판단한다(340). The initial vector is set as the previous vector, the overall similarity matrix is multiplied by the previous vector to generate the next vector (S330), and the generated vector is subtracted from the previous vector to determine whether the difference between the next vector and the previous vector is smaller than the set threshold (340).

이전 벡터와 다음 벡터의 차가 임계값보다 작은 경우 다음 벡터에 기초하여 다음 벡터를 다수 환자의 예비 군집 데이터로 생성한다(S350). 그러나 이전 벡터와 다음 벡터의 차가 임계값보다 큰 경우 다음 벡터를 이전 벡터로 재설정하고, 재설정한 이전 벡터로부터 다음 벡터를 재생성하여 재성한 다음 벡터와 재설정한 이전 벡터의 차가 설정한 임계값보다 작을 때까지 반복하여 다음 벡터를 재생성한다.If the difference between the previous vector and the next vector is smaller than the threshold value, the next vector is generated as the standby cluster data of a plurality of patients based on the next vector (S350). However, if the difference between the previous vector and the next vector is greater than the threshold value, the next vector is reset to the previous vector, the next vector is reconstructed from the previous set vector, and the difference between the vector and the previous set vector is smaller than the set threshold And the next vector is regenerated.

이러한 다음 벡터의 재생성을 통해 다수 환자 중 서로 밀접한, 즉 유사한 환자는 더욱 밀집하게 된다.
Through the regeneration of these next vectors, more closely related, i.e., similar, patients among a larger number of patients become more dense.

도 9는 본 발명에 따른 치료 조언 방법을 설명하기 위한 흐름도이다.FIG. 9 is a flowchart for explaining a treatment advice method according to the present invention.

도 9를 참고로 보다 구체적으로 살펴보면, 다수 환자의 환자별 치료 레이블이 있는 건강정보와 치료 레이블이 없는 건강정보로 이루어진 건강정보 데이터베이스에서 치료 레이블 정보를 포함하여 레이블 정보 종류에 따라 환자 사이의 각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬을 생성한다(S500).More specifically, referring to FIG. 9, in a health information database including health information having a treatment label for each patient and health information without a treatment label, And generates a label similarity matrix indicating similarity degree by information (S500).

각 레이블 정보별 유사도를 나타내는 레이블 유사도 행렬의 합으로부터 환자 사이의 전체 유사도 행렬을 생성하고(S600), 전체 유사도 행렬에서 서로 유사한 속성을 가지는 환자 사이의 건강정보를 밀집시켜 다수의 환자의 예비 군집 정보를 생성하고 예비 군집 정보의 분포 밀집도 특성에 기초하여 설정된 수로 상기 다수 환자를 군집화한다(S700).The total similarity matrix between the patients is generated from the sum of the label similarity matrices representing the similarity of each label information (S600), and the health information between the patients having similar attributes in the overall similarity matrix is concentrated, And cluster the plurality of patients with a predetermined number based on the distribution density characteristics of the preliminary population information (S700).

군집화된 환자의 레이블 정보 종류별 건강정보의 중심값으로 각 군집별 군집 정보를 생성한다(S800). 각 군집별 군집 정보는 각 군집을 구성하는 환자의 각 레이블 정보의 평균값을 각 군집의 각 레이블 정보로 계산한다.Cluster information for each cluster is generated as a central value of health information for each type of label information of the clustered patient (S800). The cluster information for each cluster is calculated by using the average value of each label information of the patient constituting each cluster as each label information of each cluster.

신규 환자의 건강정보와 군집 정보의 유사도를 판단하여 신규 환자와 가장 유사도가 높은 군집 정보의 치료 레이블 정보를 신규 환자에 대한 치료 레이블 정보로 제공한다(S900).
The degree of similarity between the health information and the cluster information of the new patient is determined, and the treatment label information of the cluster information having the highest degree of similarity with the new patient is provided as the treatment label information for the new patient (S900).

한편, 상술한 본 발명의 실시 예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다.The above-described embodiments of the present invention can be embodied in a general-purpose digital computer that can be embodied as a program that can be executed by a computer and operates the program using a computer-readable recording medium.

상기 컴퓨터로 읽을 수 있는 기록 매체는 마그네틱 저장 매체(예를 들어, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장 매체를 포함한다.
The computer-readable recording medium may be a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), an optical reading medium (e.g. CD ROM, Lt; / RTI > transmission).

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

110: 건강정보 데이터베이스 120: 레이블 유사도 행렬 생성부
130: 전체 유사도 행렬 생성부 140: 예비 군집 정보 생성부
150: 군집화부 111: 감정 행렬 생성부
113: 수치 행렬 생성부 115: 징후 행렬 생성부
117: 치료 행렬 생성부 119: 행렬 정규화부
141: 초기 벡터 생성부 143: 다음 벡터 생성부
145: 밀집부
100: 건강정보 군집화 장치
200: 치료 조언 장치
300: 네트워크
400: 사용자 단말기110: health information database 120: label similarity matrix generation unit
130: total degree of similarity matrix generation unit 140:
150: Clustering unit 111: Emotion matrix generating unit
113: Numerical Matrix Generating Unit 115: Indication Matrix Generating Unit
117: Treatment matrix generation unit 119: Matrix normalization unit
141: Initial vector generation unit 143: Next vector generation unit
145: dense portion
100: health information clustering device
200: Therapeutic advice device
300: Network
400: User terminal

Claims

(a) In the label similarity matrix generation unit, in the health information database made up of the health information having the treatment label for each patient and the health information without the treatment label, each label information Generating a label similarity matrix indicating the similarity of stars;
(b) generating a total similarity matrix between the patients from the sum of the label similarity matrices representing the similarity for each label information in the total similarity matrix generating unit;
(c) generating reserve cluster information of the plurality of patients by clustering the cluster aggregation information such that cluster density between patients having similar properties in the overall similarity matrix is smaller than a threshold; And
(d) clustering a plurality of patients with a predetermined number of clusters based on distribution characteristics of the preliminary clustering information in the clustering unit.

The method according to claim 1, wherein the degree of similarity of each label information between the patients is calculated as an euclidean distance of each label information for each patient.

3. The method according to claim 2, wherein the similarity of the treatment label information is set to 1 when the Euclidean distance exceeds 0.

The method of claim 1, wherein the type of the label information is
Wherein the health information includes at least one of emotion label information, numeric label information, sign label information, and treatment label information.

5. The method of claim 4,
Wherein the value of each element constituting the label similarity matrix for the emotional label information, the numeric label information, the symptom label information, and the treatment label information is normalized to the same first reference value.

6. The method according to claim 5, wherein the emotional label information, the numeric label information, the symptom label information, and the treatment label information are stored as identification values matching an identifier of each item constituting each label information .

The method as claimed in any one of claims 1 to 6, wherein the generating of the preliminary cluster information by the preliminary cluster information generating unit
(c1) generating an initial vector normalized to a second reference value;
(c2) setting the initial vector to a previous vector and generating a next vector from a product of the global similarity matrix and the previous vector;
(c3) determining whether a difference between the previous vector and the next vector is less than a threshold value; And
(c4) generating the next vector as the preliminary cluster information of the plurality of patients when the difference between the previous vector and the next vector is smaller than a threshold value,
And if the difference between the previous vector and the next vector is greater than the threshold value, the next vector is set as a previous vector, and the steps (c2) to (c3) are repeatedly performed.

8. The method according to claim 7, wherein the preliminary cluster information is clustered into a set number of clusters according to distribution density characteristics of the preliminary clustering information through a K-means clustering algorithm.

(a) a label similarity label indicating the degree of similarity between each label information according to the type of label information, including the treatment label information, in a health information database including health information with treatment label and patient information without treatment label for a plurality of patients; Generating a matrix;
(b) generating a total similarity matrix between the patients from the sum of the label similarity matrices indicating the similarity for each label information;
(c) generating preliminary population information of the plurality of patients by densifying health information between patients having similar properties in the overall similarity matrix, and clustering the plurality of patients with a predetermined number based on distribution density characteristics of the preliminary population information ;
(d) generating cluster information for each cluster with the center value of health information for each type of label information of the clustered patient; And
(e) determining similarity between the health information of the new patient and the cluster information, and providing the treatment label information of the cluster information having the highest degree of similarity to the new patient as the treatment label information for the new patient,
Wherein the steps (a) to (d) are performed in a health information clustering apparatus, and the step (e) is performed in a medical advice apparatus.

10. The method of claim 9, wherein generating the preliminary cluster information comprises:
(c1) generating an initial vector normalized to a reference value;
(c2) setting the initial vector to a previous vector and generating a next vector from a product of the global similarity matrix and the previous vector;
(c3) determining whether a difference between the previous vector and the next vector is less than a threshold value; And
(c4) generating the preliminary cluster data of the plurality of patients based on the next vector if a difference between the previous vector and the next vector is less than a threshold value,
And if the difference between the previous vector and the next vector is larger than the threshold value, the next vector is set as a previous vector and the steps (c2) to (c3) are repeatedly performed.

The method of claim 10, wherein the label information type is
Wherein the information includes at least one of emotion label information, numerical label information, sign label information, and treatment label information.

A label similarity matrix generating unit for generating a label similarity matrix representing the similarity of each label information between the patients in a health information database made up of health information having a treatment label for each patient and health information having no treatment label;
A total similarity matrix generator for generating a total similarity matrix between patients based on a sum of label similarity matrices indicating similarities for each label information;
A preliminary community information generation unit for generating preliminary community information of the plurality of patients by densifying the population density between patients having similar attributes in the overall similarity matrix to be smaller than a threshold value; And
And a clustering unit for clustering a plurality of patients with a set number of clusters based on distribution density characteristics of the preliminary clustering information.

13. The apparatus as claimed in claim 12, wherein the label similarity matrix generator
An emotion matrix generation unit for calculating similarity of emotion label information between the patients and generating an emotion label similarity matrix between the patients;
A numerical matrix generation unit for calculating a similarity of the numerical label information between the patients and generating a numerical label similarity matrix between the patients;
A symptom matrix generation unit for calculating the similarity of the symptom label information between the patients and generating a symptom label similarity matrix between the patients; And
And a treatment matrix generation unit for calculating a similarity of treatment label information between the patients and generating a treatment label similarity matrix between the patients.

14. The apparatus as claimed in claim 13, wherein the label similarity matrix generator
And a matrix normalization unit for normalizing the emotional label matrix, the numeric label matrix, the symptom label matrix, and the treatment label matrix to the same first reference value.

13. The apparatus of claim 12, wherein the spare cluster information generator
An initial vector generation unit for generating an initial vector normalized to a second reference value;
A next vector generating unit for setting the initial vector as a previous vector and multiplying the previous vector by the global similarity matrix to generate a next vector;
If the difference between the previous vector and the next vector is less than the threshold value and the difference between the previous vector and the next vector is greater than the threshold value based on the determination result, the next vector is set as the previous vector, A dense portion to be controlled; And
And a preliminary community determination unit for generating the next vector as the preliminary community information of the plurality of patients when the difference between the previous vector and the next vector is smaller than a threshold value based on a determination result of the dense unit. Device.

16. The apparatus of claim 15, wherein the clustering unit
Wherein the preliminary cluster information is clustered with a number set according to a distribution density characteristic of the preliminary cluster information through a K-means clustering algorithm.