KR20200053185A

KR20200053185A - System and method for evaluating performance of symptom similarity measure apparatus

Info

Publication number: KR20200053185A
Application number: KR1020180136481A
Authority: KR
Inventors: 이정설
Original assignee: 주식회사 쓰리빌리언
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2020-05-18
Also published as: KR102167697B1

Abstract

The present invention relates to a system of evaluating performance of an arbitrary symptom similarity measuring device which measures similarity between a set of patient symptoms which a patient has and a set of disease symptoms known for a disease. The system of evaluating performance of a symptom similarity measuring device comprises: a similarity calculation unit for calculating a first similarity between a set of patient symptoms and a set of symptoms of diseases related to genetic mutation and a second similarity between the set of patient symptoms and a set of symptoms of diseases unrelated to genetic mutation, for each patient by the symptom similarity measuring device; a similarity correction unit for generating a first corrected similarity by correcting the first similarity and generating a second corrected similarity by correcting the second similarity; a total similarity correction unit for generating a first total corrected similarity obtained by combining the first corrected similarity for all patients and a second total corrected similarity obtained by combining the second corrected similarity for all patients; and a verification unit for verifying performance by comparing the first total corrected similarity and the second total corrected similarity.

Description

Performance evaluation system and method for symptom similarity meter {SYSTEM AND METHOD FOR EVALUATING PERFORMANCE OF SYMPTOM SIMILARITY MEASURE APPARATUS}

본 발명은 환자의 증상들과 질병에 대해 알려진 증상들의 유사도를 측정하는 증상 유사도 측정기에 대한 성능을 평가하는 증상 유사도 측정기에 대한 성능 평가 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for evaluating the performance of a symptom similarity measuring device for evaluating the performance of a symptom similarity measuring device to measure the similarity between symptoms of a patient and symptoms known for a disease.

정확한 진단을 위해서는 환자에서 관찰된 증상에 기반하여 유력한 질병을 제시하는 것이 중요하다. 최근에는 각각의 질병에 수반되는 증상 정보가 구축되어 있으므로 각 질병의 알려진 증상과 환자의 증상을 비교하여 진단을 돕는 방법 또는 장치(소프트웨어)가 널리 이용되고 있다.For accurate diagnosis, it is important to suggest a promising disease based on the symptoms observed in the patient. In recent years, since symptom information accompanying each disease is constructed, a method or device (software) that helps diagnosis by comparing a patient's symptom with a known symptom of each disease is widely used.

그러나 각 질병의 증상과 환자의 증상을 어떠한 방법으로 비교할 것인지에 대한 명확한 기준은 존재하지 않는다.However, there are no clear criteria for how to compare the symptoms of each disease to the symptoms of a patient.

다수의 질병이 동일 증상을 공유하거나 동일 질병에 나타나는 증상들이 항상 같은 빈도로 나타나는 것은 아니기 때문에, 질병의 증상들과 환자의 증상들을 비교하여 정확한 질병을 찾아 내는 것은 어려운 문제이다.Since multiple diseases share the same symptoms or symptoms appearing in the same disease do not always occur at the same frequency, it is difficult to find the exact disease by comparing the symptoms of the disease with the symptoms of the patient.

환자에서 관찰된 증상들과 각 질병의 알려진 증상들 중에 공유되는 증상들에 기반하여 환자 증상 집합과 질병 증상 집합의 유사도를 측정하거나, 각 증상에 연관된 단백질 상호 작용 네트워크의 유사성에 기반하여 환자 증상 집합과 질병 증상 집합의 유사도를 측정하는 방법과 그에 따른 증상 유사도 측정기가 이용될 수 있다.Measure the similarity between the patient symptom set and the disease symptom set based on the symptoms observed in the patient and the symptoms shared among the known symptoms of each disease, or based on the similarity of the protein interaction network associated with each symptom. A method for measuring the similarity of a set of symptoms and a corresponding symptom similarity measuring device may be used.

그러나 증상 유사도 측정 방법 및 증상 유사도 측정기의 성능을 평가하는 것에 대한 고찰은 이루어지지 않고 있는 실정이다. However, no consideration has been made on the method of measuring the symptom similarity and evaluating the performance of the symptom similarity meter.

환자 증상 집합과 질병 증상 집합 간의 유사도를 측정하는 방법 및 증상 유사도 측정기의 성능을 평가할 수 있어야 비로소 보다 향상되고 정확한 증상 유사도 측정기의 개발을 기대할 수 있다.A method for measuring the similarity between the patient symptom set and the disease symptom set and the performance of the symptom similarity meter can be evaluated to develop a more accurate and accurate symptom similarity meter.

본 발명이 이루고자 하는 기술적 과제는 환자의 증상들과 질병에 대해 알려진 증상들의 유사도를 측정하는 임의의 증상 유사도 측정기에 대한 성능을 평가하는 증상 유사도 측정기에 대한 성능 평가 방법 및 장치를 제공하고자 한다.The technical object to be achieved by the present invention is to provide a performance evaluation method and apparatus for a symptom similarity measurer that evaluates the performance of any symptom similarity measurer that measures the similarity between symptoms of a patient and symptoms known to a disease.

이러한 과제를 해결하기 위하여 본 발명의 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템은 환자가 가지고 있는 환자 증상 집합과 질병에 대해 알려진 질병 증상 집합 간의 유사도를 측정하는 임의의 증상 유사도 측정기에 대한 성능 평가 시스템에 있어서, 상기 증상 유사도 측정기에 의해 각각의 환자에 대해서 환자 증상 집합과 유전 변이 연관 질병 증상 집합 간의 제1 유사도, 및 환자 증상 집합과 유전 변이 무관 질병 증상 집합 간의 제2 유사도를 산출하는 유사도 산출부; 상기 제1 유사도를 보정하여 제1 보정 유사도를 생성하고, 상기 제2 유사도를 보정하여 제2 보정 유사도를 생성하는 유사도 보정부; 전체 환자에 대하여 상기 제1 보정 유사도를 합한 제1 전체 보정 유사도와 전체 환자에 대하여 상기 제2 보정 유사도를 합한 제2 전체 보정 유사도를 생성하는 전체 유사도 보정부; 및 상기 제1 전체 보정 유사도와 상기 제2 전체 보정 유사도를 서로 비교하여 성능을 검증하는 검증부를 포함한다.To solve this problem, the performance evaluation system for the symptom similarity meter according to an embodiment of the present invention evaluates the performance of an arbitrary symptom similarity meter that measures the similarity between a patient symptom set and a known disease symptom set for a patient. A system comprising: a similarity calculating unit for calculating a first similarity between a patient symptom set and a disease symptom set associated with genetic variation, and a second similarity between a patient symptom set and a disease symptom set independent of genetic variation for each patient; A similarity correction unit that corrects the first similarity to generate a first correction similarity and corrects the second similarity to generate a second correction similarity; An overall similarity correction unit generating a first overall correction similarity summing the first correction similarities for all patients and a second overall correction similarity summing the second correction similarities for all patients; And a verification unit comparing the first overall correction similarity and the second overall correction similarity to verify performance.

상기 유사도 산출부는, 성능 평가 대상인 증상 유사도 측정기를 선택하는 선택부; 환자가 가지고 있는 유전 변이 정보를 취득하는 수취부; 상기 유전 변이에 의해 유발되는 유전 변이 연관 질병에 대한 정보와 상기 유전 변이와 무관한 유전 변이 무관 질병에 대한 정보를 탐색하는 질병 정보 탐색부; 상기 유전 변이 연관 질병에 대해 알려진 증상과 상기 유전 변이 무관 질병에 대해 알려진 증상에 대한 정보를 탐색하는 증상 정보 탐색부; 및 상기 증상 유사도 측정기에 의해 환자 증상 집합과 상기 유전 변이 연관 질병 증상 집합 간의 제1 유사도를 측정하고, 상기 증상 유사도 측정기에 의해 유전 변이 무관 질병 증상 집합 간의 제2 유사도를 측정하는 계산부를 포함할 수 있다.The similarity calculating unit may include: a selection unit for selecting a symptom similarity measurer to be evaluated for performance; A receiving unit that acquires genetic variation information of the patient; A disease information search unit configured to search for information on diseases related to genetic variation caused by the genetic variation and information on diseases related to the genetic variation unrelated to the genetic variation; A symptom information search unit for searching for information on symptoms known to the genetic variation-related disease and symptoms known to the disease related to the genetic variation; And a calculator configured to measure a first degree of similarity between a patient symptom set and a set of disease symptoms associated with the genetic variation by the symptom similarity meter, and a second degree of similarity between a set of disease symptoms unrelated to the genetic variation by the symptom similarity meter. .

상기 유사도 보정부는 하기 식 1에 의해 상기 제1 보정 유사도를 생성하고, 하기 식 2에 의해 상기 제2 보정 유사도를 생성할 수 있다.The similarity correcting unit may generate the first correction similarity by Equation 1 below, and generate the second correction similarity by Equation 2 below.

식 1Equation 1

(상기 제1 유사도 - ave)/stdev(Above first similarity-ave) / stdev

식 2Equation 2

(상기 제2 유사도 - ave)/stdev)(Above second similarity-ave) / stdev)

(ave는 상기 제1 유사도와 상기 제2 유사도 전체의 평균값이고, stdev는 상기 제1 유사도와 상기 제2 유사도 전체의 표준 편차이다.)(ave is the average value of the first similarity and the second similarity overall, and stdev is the standard deviation of the first similarity and the second similarity overall.)

상기 검증부는 상기 제1 전체 보정 유사도 분포의 대표값과 상기 제2 전체 보정 유사도 분포의 대표값을 서로 비교할 수 있다.The verification unit may compare a representative value of the first overall corrected similarity distribution with a representative value of the second overall corrected similarity distribution.

상기 대표값은 평균 또는 중위수(median)일 수 있다.The representative value may be an average or a median.

상기 검증부는, 상기 제1 전체 보정 유사도 분포의 대표값과 상기 제2 전체 보정 유사도 분포의 대표값의 차이가 클수록 더 성능이 우수하다고 평가할 수 있다.The verification unit may evaluate that the higher the difference between the representative value of the first overall corrected similarity distribution and the representative value of the second overall corrected similarity distribution, the better the performance.

본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법은 유사도 산출부에서, 임의의 증상 유사도 측정기에 의해 각각의 환자에 대해서 환자 증상 집합과 유전 변이 연관 질병 증상 집합 간의 제1 유사도, 및 환자 증상 집합과 유전 변이 무관 질병 증상 집합 간의 제2 유사도를 산출하는 유산도 산출 단계; 유사도 보정부에서, 상기 제1 유사도를 보정하여 제1 보정 유사도를 생성하고, 상기 제2 유사도를 보정하여 제2 보정 유사도를 생성하는 유사도 보정 단계; 전체 유사도 보정부에서, 전체 환자에 대하여 상기 제1 보정 유사도를 합한 제1 전체 보정 유사도와 전체 환자에 대하여 상기 제2 보정 유사도를 합한 제2 전체 보정 유사도를 생성하는 전체 유사도 보정 단계; 및 검증부에서, 상기 제1 전체 보정 유사도와 상기 제2 전체 보정 유사도를 서로 비교하여 성능을 검증하는 검증 단계를 포함한다.The performance evaluation method for the symptom similarity meter according to an embodiment of the present invention includes a first similarity between a patient symptom set and a disease symptom set associated with genetic variation in each patient by an arbitrary symptom similarity meter in the similarity calculation unit, and the patient A miscarriage calculation step of calculating a second similarity between the symptom set and the disease symptom set irrelevant to genetic variation; A similarity correcting step in the similarity correcting unit, correcting the first similarity to generate a first corrected similarity, and correcting the second similarity to generate a second corrected similarity; A total similarity correction step of generating, in the overall similarity correction unit, a first overall correction similarity sum of the first correction similarities for all patients and a second overall correction similarity of the second correction similarities for all patients; And a verification step of verifying performance by comparing the first overall correction similarity with the second overall correction similarity in the verification unit.

상기 유사도 측정 단계는, 선택부에서, 성능 평가 대상인 증상 유사도 측정기를 선택하는 단계; 수취부에서, 환자가 가지고 있는 유전 변이 정보를 취득하는 단계; 질병 정보 탐색부에서, 상기 유전 변이에 의해 유발되는 유전 변이 연관 질병에 대한 정보와 상기 유전 변이와 무관한 유전 변이 무관 질병에 대한 정보를 탐색하는 단계; 증상 정보 탐색부에서, 상기 유전 변이 연관 질병에 대해 알려진 증상과 상기 유전 변이 무관 질병에 대해 알려진 증상에 대한 정보를 취득하는 단계; 및 계산부에서, 상기 증상 유사도 측정기에 의해 환자 증상 집합과 상기 유전 변이 연관 질병 증상 집합 간의 제1 유사도를 측정하고, 상기 증상 유사도 측정기에 의해 유전 변이 무관 질병 증상 집합 간의 제2 유사도를 측정하는 단계를 포함할 수 있다.The similarity measurement step may include selecting, from the selection unit, a symptom similarity measurer to be evaluated for performance; In the receiving unit, obtaining the genetic variation information that the patient has; In the disease information search unit, searching for information on a disease related to a genetic variation caused by the genetic variation and information on a disease related to a genetic variation unrelated to the genetic variation; In the symptom information search unit, acquiring information about a symptom known to the genetic variation-related disease and a symptom known to the genetic variation-related disease; And in the calculation unit, measuring a first similarity between the patient symptom set and the disease symptom set associated with the genetic variation by the symptom similarity meter, and measuring the second similarity between the disease symptom set unrelated to the genetic variation by the symptom similarity meter. It can contain.

상기 유사도 보정 단계는, 상기 유사도 보정부를 통해, 하기 식 1에 의해 상기 제1 보정 유사도를 생성되고, 하기 식 2에 의해 상기 제2 보정 유사도를 생성될 수 있다.In the similarity correction step, the first correction similarity may be generated by Equation 1 below, and the second correction similarity may be generated by Equation 2 below through the similarity correction unit.

식 1Equation 1

(상기 제1 유사도 - ave)/stdev(Above first similarity-ave) / stdev

식 2Equation 2

(상기 제2 유사도 - ave)/stdev)(Above second similarity-ave) / stdev)

상기 검증 단계는, 상기 검증부를 통해, 상기 제1 전체 보정 유사도 분포의 대표값과 상기 제2 전체 보정 유사도 분포의 대표값을 서로 비교할 수 있다.In the verifying step, a representative value of the first overall corrected similarity distribution and a representative value of the second overall corrected similarity distribution may be compared with each other through the verification unit.

상기 대표값은 평균 또는 중위수(median) 일 수 있다.The representative value may be an average or a median.

상기 검증부를 통해, 상기 제1 전체 보정 유사도 분포의 대표값과 상기 제2 전체 보정 유사도 분포의 대표값의 차이가 클수록 더 성능이 우수하다고 평가할 수 있다.Through the verification unit, the larger the difference between the representative value of the first overall corrected similarity distribution and the representative value of the second overall corrected similarity distribution, it can be evaluated that the better the performance.

위에서 언급된 본 발명의 기술적 과제 외에도, 본 발명의 다른 특징 및 이점들이 이하에서 기술되거나, 그러한 기술 및 설명으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.In addition to the technical problems of the present invention mentioned above, other features and advantages of the present invention are described below, or it will be clearly understood by those skilled in the art from the description and description.

이상과 같은 본 발명에 따르면 다음과 같은 효과가 있다.According to the present invention as described above has the following effects.

본 발명은 환자가 가지고 있는 유전 변이와 질병에 대한 연관성을 이용함으로써 환자 증상 집합과 질병 증상 집합 간의 유사도를 평가하는 증상 유사도 측정기의 성능을 검증할 수 있다.The present invention can verify the performance of a symptom similarity measurer that evaluates the similarity between a patient symptom set and a disease symptom set by using a correlation between a genetic variation and a disease possessed by the patient.

본 발명은 증상 유사도 측정기에 의한 유사도를 환자 전체에 대해서 보정하여 검증함으로써 검증에 대한 신뢰성을 향상시킬 수 있다.The present invention can improve the reliability of verification by correcting and verifying the similarity by the symptom similarity meter for the entire patient.

이 밖에도, 본 발명의 실시 예들을 통해 본 발명의 또 다른 특징 및 이점들이 새롭게 파악될 수도 있을 것이다.In addition, other features and advantages of the present invention may be newly identified through embodiments of the present invention.

도 1은 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템의 구성도이다.
도 2는 본 발명에 따른 유사도 산출부의 구성도이다.
도 3은 증상의 계층적 구조를 나타내는 예시도이다.
도 4는 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법의 순서도다.
도 5는 본 발명에 따른 유사도 산출부의 순서도다.1 is a block diagram of a performance evaluation system for a symptom similarity meter according to an embodiment of the present invention.
2 is a configuration diagram of a similarity calculating unit according to the present invention.
3 is an exemplary view showing a hierarchical structure of symptoms.
4 is a flowchart of a performance evaluation method for a symptom similarity measuring device according to an embodiment of the present invention.
5 is a flowchart of a similarity calculating unit according to the present invention.

본 명세서에서 각 도면의 구성요소들에 참조번호를 부가함에 있어서 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 번호를 가지도록 하고 있음에 유의하여야 한다. It should be noted that in this specification, when adding reference numerals to components of each drawing, the same components have the same number as possible, even if they are displayed on different drawings.

한편, 본 명세서에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다. On the other hand, the meaning of the terms described in this specification should be understood as follows.

단수의 표현은 문맥상 명백하게 다르게 정의하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "제1", "제 2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다.It should be understood that a singular expression includes a plurality of expressions unless the context clearly defines otherwise, and the terms "first", "second", etc. are intended to distinguish one component from another component, The scope of rights should not be limited by these terms.

"포함하다" 또는 "가지다" 등의 용어는 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It should be understood that terms such as “include” or “have” do not preclude the existence or addition possibility of one or more other features or numbers, steps, actions, components, parts or combinations thereof.

이하, 첨부되는 도면을 참고하여 상기 문제점을 해결하기 위해 고안된 본 발명의 바람직한 실시예들에 대해 상세히 설명한다.Hereinafter, preferred embodiments of the present invention designed to solve the above problems will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템의 구성도이고, 도 2는 본 발명에 따른 유사도 산출부의 구성도이다.1 is a configuration diagram of a performance evaluation system for a symptom similarity measurer according to an embodiment of the present invention, and FIG. 2 is a configuration diagram of a similarity calculation unit according to the present invention.

도 1 및 도 2를 참조하면, 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템(1000)은 유사도 산출부(100), 유사도 보정부(200), 전체 유사도 보정부(300), 및 검증부(400)를 포함한다.1 and 2, the performance evaluation system 1000 for a symptom similarity measurer according to an embodiment of the present invention includes a similarity calculation unit 100, a similarity correction unit 200, and a total similarity correction unit 300. , And the verification unit 400.

유사도 산출부(100)는 임의의 증상 유사도 측정기에 의해 각각의 환자에 대한 유전 변이 연관 질병 정보에 기반한 제1 유사도와, 유전 변이와 무관한 질병 정보에 기반한 제2 유사도를 산출할 수 있다.The similarity calculating unit 100 may calculate a first similarity based on genetic variation-related disease information for each patient and a second similarity based on disease information irrelevant to genetic variation by an arbitrary symptom similarity measurer.

여기서 증상 유사도 측정기는 본 발명의 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템을 통해 성능을 검증하고자 하는 대상이다.Here, the symptom similarity meter is an object for verifying performance through a performance evaluation system for the symptom similarity meter according to an embodiment of the present invention.

증상 유사도 측정기는 환자에서 관찰된 증상과 질병에 대해 알려진 증상 집합 간의 유사도를 측정하는 장치 또는 알고리즘이다.A symptom similarity meter is a device or algorithm that measures the similarity between a symptom observed in a patient and a set of known symptoms for a disease.

일 예로, 임의의 증상 유사도 측정기는 환자에서 관찰된 증상들과 각 질병에 대해 알려진 증상들 중에서 공유되는 증상들에 기반한 환자 증상 집합과 질병 증상 집합의 유사도를 측정하는 증상 유사도 측정기일 수 있고, 각 증상에 연관된 단백질 상호 작용 네트워크의 유사성에 기반하여 환자 증상 집합과 질병 증상 집합의 유사도를 측정하는 증상 유사도 측정기일 수 있다. For example, the arbitrary symptom similarity measurer may be a symptom similarity measurer that measures the similarity between a patient symptom set and a disease symptom set based on symptoms observed in a patient and symptoms shared among known symptoms for each disease. It may be a symptom similarity measurer that measures the similarity between the patient symptom set and the disease symptom set based on the similarity of the protein interaction network associated with.

유사도 산출부(100)는 증상 유사도 측정기를 선택하는 선택부(110), 환자의 유전 변이 정보를 취득하는 수취부(130), 유전 변이 연관 질병과 유전 변이 무관 질병에 대한 정보를 탐색하는 질병 정보 탐색부(150), 유전 변이 연관 질병 증상과 유전 변이 무관 질병 증상에 대한 정보를 탐색하는 증상 정보 탐색부(170), 및 제1 유사도와 제2 유사도를 측정하는 계산부(190)를 포함할 수 있다.The similarity calculating unit 100 is a selection unit 110 for selecting a symptom similarity measurer, a receiving unit 130 for acquiring genetic variation information of a patient, and disease information searching for information on diseases related to genetic variation and diseases unrelated to genetic variation. Includes a search unit 150, a symptom information search unit 170 for searching for information on disease symptoms related to genetic variation and disease symptoms unrelated to genetic variation, and a calculation unit 190 for measuring the first and second similarities. Can be.

선택부(110)는 임의의 증상 유사도 측정기 중에서 성능을 검증하고자 하는 증상 유사도 측정기를 선택할 수 있다.The selector 110 may select a symptom similarity measurer to verify performance from any symptom similarity measurer.

수취부(130)는 각각의 환자가 가지고 있는 유전 변이 정보를 저장하고 있는 데이터베이스(500)로부터 유전 변이 정보를 취득할 수 있다. 이때 환자의 유전 변이는 유전적 조성의 변환이나 변화에 의하여 일어나는 변이를 말한다. 유전적 변이는 대립 유전자(allele), 단일 뉴클레오티드 다형성(Single Nucleotide Polymorphism: SNP), 돌연변이, 또는 이들의 조합일 수 있다. 대립 유전자는 하나의 염색체에서 같은 위치(locus)에 존재하면서 서로 다른 형질을 나타내는 유전자를 말하고, 상동 염색체에서 같은 유전자 위치에 위치하는 다른 염기서열을 갖는 유전자를 말한다. 돌연변이는 점 돌연변이(point mutation), 전이(transition) 돌연변이, 전환(transversion) 돌연변이, 미스센스 돌연변이, 넌센스 돌연변이, 중복(duplication), 결실(deletion), 삽입(insertion), 전좌(translocation), 역위(inversion), 및 이들의 조합일수 있다. SNP는 게놈 서열 중 개인의 편차를 나타내는 한 개 또는 수십 개의 염기 변이를 말한다.The receiving unit 130 may acquire genetic variation information from the database 500 that stores genetic variation information of each patient. At this time, the genetic variation of the patient refers to a variation that occurs due to a transformation or change in the genetic composition. The genetic variation can be an allele, single nucleotide polymorphism (SNP), mutation, or combinations thereof. Allele refers to a gene that exists in the same position (locus) on one chromosome and shows different traits, and refers to a gene having different nucleotide sequences located at the same gene position on a homologous chromosome. Mutations include point mutation, transition mutation, transition mutation, missense mutation, nonsense mutation, duplication, deletion, insertion, translocation, inversion ( inversion), and combinations thereof. SNP refers to one or dozens of nucleotide variations representing individual variation in the genome sequence.

이러한 환자의 유전 변이 정보는 환자 각각의 고유한 개인인식번호(personal identification number, PIN)와 암호가 부여될 수 있으며, 개인인식번호와 암호가 부여된 상태로 데이터베이스(500)에 저장될 수 있다.The genetic variation information of the patient may be assigned to each patient's unique personal identification number (PIN) and password, and may be stored in the database 500 with the personal identification number and password assigned.

질병 정보 탐색부(150)는 각각의 환자에 대하여 유전 변이 연관 질병과 유전 변이 무관 질병에 대한 정보를 탐색할 수 있다.The disease information search unit 150 may search for information on diseases related to genetic variation and diseases unrelated to genetic variation for each patient.

유전 변이 연관 질병이란 환자에서 관찰되는 유전 변이에 대하여, 그 변이에 의하여 유발되는 것으로 알려진 질병을 의미하고, 유전 변이 무관 질병이란 환자에서 관찰되는 유전 변이에 대하여, 그 변이에 의하여 유발된다고 알려진 질병 이외의 질병을 의미한다.Genetic mutation-related disease refers to a disease known to be caused by the mutation for the genetic variation observed in the patient, and disease related to genetic variation is not a disease known to be caused by the mutation for the genetic variation observed in the patient. Means disease.

즉, 질병 정보 탐색부(150)는 각각의 환자에 대해서 환자에서 관찰되는 유전 변이 정보를 데이터베이스(500)로부터 취득하고, 취득된 환자의 유전 변이 중에서 유전 변이 연관 질병 정보와 유전 변이 무관 질병 정보를 구분하여 데이터베이스(500)에 저장할 수 있다That is, the disease information search unit 150 acquires genetic variation information observed in the patient for each patient from the database 500 and, among the genetic variations of the acquired patient, disease information related to genetic variation and disease information irrelevant to the genetic variation. Can be classified and stored in database 500

증상 정보 탐색부(170)는 유전 변이 연관 질병 증상과 유전 변이 무관 질병 증상에 대한 정보를 탐색할 수 있다.The symptom information search unit 170 may search for information on disease symptoms related to genetic variation and disease symptoms irrelevant to genetic variation.

여기서, 유전 변이 연관 질병 증상이란 유전 변이 연관 질병에 대해 알려진 증상이고, 유전 변이 무관 질병 증상이란 유전 변이 무관 질병에 대해 알려진 증상이다.Here, the genetic variation-related disease symptoms are known symptoms of the genetic variation-related disease, and the genetic variation-related disease symptoms are known symptoms of the genetic variation-related disease.

즉, 증상 정보 탐색부(170)는 데이터베이스(500)로부터 각각의 환자에 대해서 각각의 환자가 가지고 있는 유전 변이에 의해 유발될 수 있는 질병들에 나타나는 증상인 유전 변이 연관 질병 증상을 탐색하고, 각각의 환자에 대해서 전체 질병 중에서 유전 변이에 의해 유발 될 수 있는 질병 이외의 질병들에 나타나는 증상인 유전 변이 무관 질병 증상을 탐색할 수 있다. 이때, 전체 질병이란 증상 유사도 측정기에 의해 환자 증상 집합과 질병 증상 집합 간의 유사도를 계산할 수 있을 만큼 증상 정보가 충분히 알려진 질병으로 데이터베이스(500)에 기 저장되어 있다.That is, the symptom information search unit 170 searches the disease symptoms associated with genetic variation, which is a symptom that appears in diseases that may be caused by the genetic variation of each patient, for each patient from the database 500, respectively Among patients, the disease symptoms irrelevant to the genetic variation, which are symptoms of diseases other than those that may be caused by genetic variation, can be searched for among all diseases. In this case, the entire disease is a disease in which the symptom information is sufficiently known so that the similarity between the patient symptom set and the disease symptom set can be calculated by the symptom similarity measurer, and is stored in the database 500.

계산부(190)는 증상 유사도 측정기에 의해 제1 유사도와 제2 유사도를 측정할 수 있다.The calculator 190 may measure the first similarity and the second similarity by a symptom similarity measurer.

제1 유사도란 증상 유사도 측정기에 의해 계산된 환자의 증상 집합과 환자의 유전 변이 연관 질병 증상 집합 간의 값이고, 제2 유사도란 증상 유사도 측정기에 의해 계산된 환자의 증상 집합과 환자의 유전 변이 무관 질병 증상 집합 간의 값이다.The first similarity is a value between the patient's symptom set calculated by the symptom similarity meter and the patient's genetic variation associated disease symptom set, and the second similarity is the patient's symptom set calculated by the symptom similarity meter and the patient's genetic variation irrelevant disease symptom set It is the value of the liver.

예를 들어, 계산부(190)는 환자에서 나타나는 환자 증상 집합 A가 있고, 그 환자의 유전 변이 연관 질병 N개에 대한 각각의 증상 집합인 유전 변이 연관 질병 증상 집합 B에 대해서, 증상 유도 측정기를 통하여 증상 집합 A와 유전 변이 연관 질병 집합 B간의 유사도를 측정하여 N개의 제1 증상 유사도 값을 생성할 수 있다.For example, the calculation unit 190 has a patient symptom set A that appears in a patient, and a symptom induction meter for a symptom set B associated with a genetic variation associated with each of the symptom sets for N disease-related diseases of the patient. By measuring the similarity between the symptom set A and the disease set B associated with genetic variation, N first symptom similarity values may be generated.

그리고, 그 환자의 유전 변이 무관 질병 M개에 대한 각각의 증상 집합인 유전 변이 무관 질병 증상 집합 C에 대해서, 증상 유사도 측정기를 통하여 환자 증상 집합 A와 유전 변이 무관 질병 증상 집합 C간의 유사도를 측정하여 M개의 제2 증상 유사도 값을 생성할 수 있다.Then, for each of the symptom sets C for each of the symptom sets for the M disease-related diseases of the patient, the similarity between the patient symptom set A and the genetic variation-related disease symptom set C is measured through a symptom similarity meter. M second symptom similarity values may be generated.

유사도 보정부(200)는 제1 유사도를 보정하여 제1 보정 유사도를 생성하고, 제2 유사도를 보정하여 제2 보정 유사도를 생성할 수 있다.The similarity correcting unit 200 may generate a first corrected similarity by correcting the first similarity, and generate a second corrected similarity by correcting the second similarity.

단일 환자에 대하여 제1 보정 유사도와 제2 보정 유사도를 생성 하지 않을 경우 그 환자가 가지고 있는 소수의 증상들의 특성에 의존하는 결과가 나와 증상 유사도 측정기의 성능을 정확히 검증하기 어렵다. 각 환자에 나타나는 증상의 특성에 따라 제1 유사도와 제2 유사도의 분포 특성(평균 및 분산 등)이 결정된다.If a first correction similarity and a second correction similarity are not generated for a single patient, a result depending on the characteristics of a small number of symptoms possessed by the patient appears, and it is difficult to accurately verify the performance of the symptom similarity meter. The distribution characteristics (average and variance, etc.) of the first and second similarities are determined according to the characteristics of symptoms in each patient.

따라서 먼저, 각 환자에서 나오는 제1 유사도와 제2 유사도의 분포 특성을 동일하게 한다.Therefore, first, the distribution characteristics of the first similarity and the second similarity from each patient are made equal.

이를 위해, 유사도 보정부(200)는 하기 식 1에 의해 제1 유사도를 보정하여 제1 보정 유사도를 생성하고, 하기 식 2에 의해 제2 유사도를 보정하여 제2 보정 유사도를 생성할 수 있다.To this end, the similarity correcting unit 200 may generate a first correction similarity by correcting the first similarity by Equation 1 below and a second correction similarity by correcting the second similarity by Equation 2 below.

식 1Equation 1

(상기 제1 유사도 - ave)/stdev(Above first similarity-ave) / stdev

식 2Equation 2

(상기 제2 유사도 - ave)/stdev)(Above second similarity-ave) / stdev)

여기서, ave는 상기 제1 유사도와 상기 제2 유사도 전체의 평균값이고, stdev는 상기 제1 유사도와 상기 제2 유사도 전체의 표준 편차이다.Here, ave is an average value of the first similarity and the second similarity as a whole, and stdev is a standard deviation of the first similarity and the second similarity as a whole.

예를 들어, 유사도 보정부(200)는 환자의 유전 변이 연관 질병이 N개이고, 환자의 유전 변이 무관 질병이 M개이면, 총 M+N개의 증상 유사도 값들에 대하여 그 평균값인 ave를 구하고, 그 표준 편차인 atdev를 구할 수 있다. 그리고, 제1 유사도에 대해 상기 식 1을 적용하여 제1 보정 유사도를 생성하고, 제2 유사도에 대해 상기 식 2를 적용하여 제2 보정 유사도를 생성할 수 있다.For example, the similarity correction unit 200, if the number of diseases associated with the genetic variation of the patient is N, and the number of diseases associated with the genetic variation of the patient is M, the average value of the ave values for the total M + N symptom similarity values is obtained, and the The standard deviation, atdev, can be found. In addition, the first correction similarity may be generated by applying Equation 1 to the first similarity, and the second correction similarity may be generated by applying Equation 2 for the second similarity.

전제 유사도 보정부(300)는 전체 환자에 대하여 제1 보정 유사도를 합한 제1 전체 보정 유사도를 생성하고, 전체 환자에 대하여 제2 보정 유사도를 합한 제2 전체 보정 유사도를 생성할 수 있다.The premise similarity correcting unit 300 may generate a first full correction similarity summing the first correction similarities for all patients, and a second full correction similarity summing the second correction similarities for all patients.

즉, 제1 전체 보정 유사도란 전체 유사도 보정부(300)에 의해 전체 환자에 대하여 각각의 환자에 대한 제1 보정 유사도를 모은 값들의 분포이고, 제2 전체 보정 유사도란 전체 유사도 보정부(300)에 의해 전체 환자에 대하여 각각의 환자에 대한 제2 보정 유사도를 모은 값들의 분포이다.That is, the first overall correction similarity is a distribution of values obtained by collecting the first correction similarity for each patient with respect to the entire patient by the total similarity correction unit 300, and the second overall correction similarity is the total similarity correction unit 300 It is the distribution of values obtained by collecting the second correction similarity for each patient with respect to the entire patient.

검증부(400)는 제1 전체 보정 유사도와 제2 전체 보정 유사도를 서로 비교하여 증상 유사도 측정기의 성능을 검증할 수 있다.The verification unit 400 may compare the first overall correction similarity and the second overall correction similarity to verify the performance of the symptom similarity meter.

검증부(400)는 제1 전체 보정 유사도 분포의 대표값과 제2 전체 보정 유사도 분포의 대표값을 서로 비교함으로써 증상 유사도 측정기의 성능을 검증할 수 있다. 여기서, 대표값이란 제1 전체 보정 유사도 분포의 평균 또는 중위수(median)이고, 제2 전체 보정 유사도 분포의 평균 또는 중위수(median)일 수 있다.The verification unit 400 may verify the performance of the symptom similarity meter by comparing the representative value of the first overall corrected similarity distribution with the representative value of the second overall corrected similarity distribution. Here, the representative value may be an average or median of the first overall corrected similarity distribution, and may be an average or median of the second overall corrected similarity distribution.

검증부(400)는 제1 전체 보정 유사도 분포의 대표값이 제2 전체 보정 유사도 분포의 대표값 보다 크면 증상 유사도 측정기의 성능이 우수하다고 평가하고, 제1 전체 보정 유사도 분포의 대표값이 제2 전체 보정 유사도 분포의 대표값 보다 작으면 증상 유사도 측정기의 성능이 나쁜 것으로 평가할 수 있다.The verification unit 400 evaluates that the performance of the symptom similarity meter is excellent when the representative value of the first overall corrected similarity distribution is greater than the representative value of the second overall corrected similarity distribution, and the representative value of the first overall corrected similarity distribution is second. If it is smaller than the representative value of the overall correction similarity distribution, it can be evaluated that the performance of the symptom similarity meter is poor.

이와 같이, 본 발명의 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템은 환자가 가지고 있는 유전 변이와 질병에 대한 연관성을 이용함으로써 환자 증상 집합과 질병 증상 집합 간의 유사도를 평가하는 증상 유사도 측정기의 성능을 검증할 수 있다.As described above, the performance evaluation system for the symptom similarity measurer according to the embodiment of the present invention uses the correlation between the genetic variation and the disease possessed by the patient to improve the performance of the symptom similarity measurer to evaluate the similarity between the patient symptom set and the disease symptom set. Can be verified.

또한, 본 발명의 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템은 증상 유사도 측정기에 의한 유사도를 환자 전체에 대해서 보정하여 검증함으로써 검증에 대한 신뢰성을 향상시킬 수 있다.In addition, the performance evaluation system for the symptom similarity meter according to an embodiment of the present invention can improve the reliability of verification by correcting and verifying the similarity by the symptom similarity meter for the entire patient.

이하, 도 3을 참조하여 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템의 적용 예를 설명한다.Hereinafter, an application example of a performance evaluation system to a symptom similarity measuring device according to an embodiment of the present invention will be described with reference to FIG. 3.

도 3은 증상의 계층적 구조를 나타내는 예시도이다.3 is an exemplary view showing a hierarchical structure of symptoms.

도 3을 참조하면, 증상의 계층적 구조는 최상위 노드에 표현형의 이상을 의미하는 phenotype abnormality가 있으며, 그 밑으로 각 증상에 해당하는 노드가 계층을 이루어 이루어져 있다. 각 노도의 옆에 쓰여 있는 숫자는 노드의 깊이, 즉 최상위 노드로부터의 거리를 의미한다. 하나의 증상은 다수의 조상 노드에 속할 수 있으며, 증상의 계층적 구조는 시각적 단순화를 위하여 이러한 중복 노드를 표현하지 않게 도시된 예시도이다.Referring to FIG. 3, the hierarchical structure of symptoms has phenotype abnormality, which means abnormality of phenotype, at the top node, and nodes corresponding to each symptom are layered underneath. The number written next to each road means the depth of the node, that is, the distance from the top node. One symptom may belong to a number of ancestor nodes, and the hierarchical structure of the symptoms is an exemplary diagram not shown to represent such duplicate nodes for visual simplification.

먼저, 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템은 성능의 우열을 확실히 알 수 있는 두개의 증상 유사도 측정기를 비교함으로써 성능 평가의 신뢰성을 확인하고자 한다.First, the performance evaluation system for the symptom similarity meter according to an embodiment of the present invention is intended to confirm the reliability of the performance evaluation by comparing two symptom similarity meters that can clearly know the superiority of performance.

먼저, 제1 증상 유사도 측정기는 두 증상에 대하여 공통 조상 노드의 최대 깊이를 증상 간 유사도로 이용하는 양성 대조군(PC, positive control) 알고리즘이고, 제2 증상 유사도 측정기는 두 증상에 대하여 공통 조상 노드의 최대 깊이의 역수를 증상 간 유사도로 이용하는 음성 대조군(NC, negative control) 알고리즘이다. 증상의 계층적 구조에 따라, 서로 다른 두 증상의 공통 조상 노드의 깊이가 깊을수록 두 증상은 구체적인 단위에서 유사한 것을 의미하므로, 상이 예에서 양성 대조군 증상 유사도 측정기의 성능이 음성 대조군 증상 유사도 측정기의 성능보다 뛰어나야 함은 자명하며, 증상 유사도 측정기의 성능 평가 방법은 이것을 올바르게 판단해야 한다. First, the first symptom similarity measurer is a positive control (PC) algorithm that uses the maximum depth of a common ancestor node for two symptoms as similarity between symptoms, and the second symptom similarity measurer is a maximum of common ancestor nodes for two symptoms. It is a negative control (NC) algorithm that uses the reciprocal of depth as the similarity between symptoms. According to the hierarchical structure of symptoms, the deeper the depth of the common ancestor node of two different symptoms, the more the two symptoms are similar in specific units, so the performance of the positive control symptom similarity measurer in the different example is the performance of the negative control symptom similarity measurer. It is obvious that it should be superior, and the method of evaluating the performance of the symptom similarity meter should judge this correctly.

증상 유사 측정기는 질병에 대하여 알려진 질병 증상 집합과 환자에 나타나는 증상 집합에 대하여 이 두 집합의 유사도를 측정하게 되는데, 한 집합에 있는 증상에 대하여 대응 증상 집합의 증상 중 유사한 증상을 이용하여 계산을 한다.The symptom similarity meter measures the similarity between the two sets of known disease symptom sets for a disease and a set of symptoms appearing in a patient, and calculates the symptoms in one set using similar symptoms among the symptoms of the corresponding symptom set.

예를 들어, 질병 증상 집합에 노드 10인 'Absent inner eyelashes'가 있고, 환자가 이 증상을 가지고 있지 않으나 노드 9인 'Sparse eyelashes' 증상을 가지고 있다면 이 두 증상 노드의 공통 조상 노드인 'Sparse or absent eyelashes' 노드의 특성을 두 증상 Absent inner eyelashes'과 'Sparse eyelashes'의 유사도 계산에 이용한다.For example, if the disease symptom set has node 10 'Absent inner eyelashes', and the patient does not have this symptom, but node 9 has the symptom' Sparse eyelashes', the common ancestor node of these two symptom nodes is' Sparse or The characteristics of the absent eyelashes' node are used to calculate the similarity between the two symptoms Absent inner eyelashes and 'Sparse eyelashes'.

여기서, 제1 증상 유사도 측정기는 두 증상의 공통 조상 노드의 최대 깊이를 두 증상의 유사도 값으로 이용하고, 제2 증상 유사도 측정기는 두 증상의 공통 노상 노드의 최대 깊이의 역수를 두 증상의 유사도 값으로 이용한다.Here, the first symptom similarity measurer uses the maximum depth of the common ancestor node of the two symptoms as the similarity value of the two symptoms, and the second symptom similarity measurer reciprocates the maximum depth of the common heart node of the two symptoms and the similarity value of the two symptoms Use as

그럼 제1 증상 유사도 측정기와 제2 증상 유사도 측정기를 통해서 속눈썹 부재(absent of inner eyelashes)와 희미한 눈썹(sparse eyelashes)의 유사도와, 속눈썹 부재(absent of inner eyelashes)와 비강 막힘(nasal obstruction)의 유사도를 비교해 보자.Then, similarity between absent of inner eyelashes and sparse eyelashes, and similarity between absent of inner eyelashes and nasal obstruction through the first symptom similarity meter and the second symptom similarity meter. Let's compare.

제1 증상 유사도 측정기에 의하면, 속눈썹 부재(absent of inner eyelashes)와 희미한 눈썹(sparse eyelashes)의 공통 조상 노드는 'Sparse or absent eyelashes'이며 이 노드의 깊이는 8이므로, 속눈썹 부재(absent of inner eyelashes)와 희미한 눈썹(sparse eyelashes)의 유사도는 8이다.According to the first symptom similarity meter, the common ancestor node of the absent of inner eyelashes and the sparse eyelashes is 'Sparse or absent eyelashes', and the depth of this node is 8, so the absent of inner eyelashes ) And the similarity between sparse eyelashes is 8.

속눈썹 부재(absent of inner eyelashes)와 비강 막힘(nasal obstruction)의 공통 조상 노드는 'Abnormality of the face'이며 이 노드의 깊이는 3이므로, 속눈썹 부재(absent of inner eyelashes)와 비강 막힘(nasal obstruction)의 유사도는 3이다.The common ancestor node of the absent of inner eyelashes and nasal obstruction is 'Abnormality of the face' and the depth of this node is 3, so the absent of inner eyelashes and nasal obstruction The similarity of is 3.

따라서 제1 증상 유사도 측정기는 속눈썹 부재라는 증상에 대하여 비강 막힘보다 희미한 눈썹이라는 증상이 보다 유사한 증상이라는 사실을 잘 반영해 준다.Therefore, the first symptom similarity meter reflects the fact that the symptom of the faint eyebrow is more similar to the symptom of the absence of the eyelashes than the nasal blockage.

제2 증상 유사도 측정기에 의하면, 속눈썹 부재(absent of inner eyelashes)와 희미한 눈썹(sparse eyelashes)의 공통 조상 노드는 'Sparse or absent eyelashes'이며 이 노드의 깊이는 8이므로, 속눈썹 부재(absent of inner eyelashes)와 희미한 눈썹(sparse eyelashes)의 유사도는 1/8이다.According to the second symptom similarity meter, the common ancestor node of the absent of inner eyelashes and the sparse eyelashes is 'Sparse or absent eyelashes' and the depth of this node is 8, so the absent of inner eyelashes ) And the similarity between sparse eyelashes is 1/8.

속눈썹 부재(absent of inner eyelashes)와 비강 막힘(nasal obstruction)의 공통 조상 노드는 'Abnormality of the face'이며 이 노드의 깊이는 3이므로, 속눈썹 부재(absent of inner eyelashes)와 비강 막힘(nasal obstruction)의 유사도는 1/3이다.The common ancestor node of the absent of inner eyelashes and nasal obstruction is 'Abnormality of the face' and the depth of this node is 3, so the absent of inner eyelashes and nasal obstruction The similarity of is 1/3.

즉, 제2 증상 유사도 측정기는 속눈썹 부재라는 증상에 대하여 희미한 눈썹이라는 증상보다 비강 막힘이 더 높은 점수를 부여하고, 이는 속눈썹 부재라는 증상에 대하여 비강 막힘보다 희미한 눈썹이라는 증상이 보다 유사한 증상이라는 사실을 잘 반영하지 못하고 있다.That is, the second symptom similarity meter gives a higher score for nasal blockage than the symptom of faint brows for the symptom of absence of lashes, which indicates that the symptoms of lashes that are faint than nasal blockage are more similar symptoms for the symptom of no lashes. It does not reflect well.

즉, 제1 증상 유사도 측정기가 제2 증상 유사도 측정기 보다 신뢰성이 있는 증상 집합 간 유사도 측정기라는 사실을 단편적으로 알 수 있다.That is, it can be seen that the first symptom similarity measurer is a similarity measure between a set of reliable symptoms than the second symptom similarity measurer.

이에, 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템에 의해 제1 증상 유사도 측정기와 제2 증상 유사도 측정기의 성능을 비교함으로써 신뢰성을 검증하고자 한다.Accordingly, it is intended to verify reliability by comparing the performance of the first symptom similarity measurer and the second symptom similarity measurer by the performance evaluation system for the symptom similarity measurer according to an embodiment of the present invention.

제1 증상 유사도 측정기와 제2 증상 유사도 측정기의 성능 평가를 비교하기 위하여, 총 160명의 유전 변이 정보와 증상 정보를 이용하였고, 7,137개의 질병에 대한 알려진 증상 정보를 HPO(Human Phenotype Ontology)로부터 획득하여 사용하였다.In order to compare the performance evaluation of the first symptom similarity meter and the second symptom similarity meter, a total of 160 genetic variation information and symptom information were used, and known symptom information for 7,137 diseases was obtained from Human Phenotype Ontology (HPO). Used.

각 환자에 대하여 그 환자가 가지고 있는 유전 변이에 대하여 그 유전 변이에 연관된 질병과 그렇지 않은 질병의 두 질병 그룹으로 나눈다. 즉, 유전 변이 연관 질병 그룹과 유전 변이 무관 질병 그룹으로 나눈다.For each patient, the genetic variation that the patient has is divided into two disease groups: diseases associated with the genetic variation and those not. That is, it is divided into a group of diseases associated with genetic variation and a group of diseases unrelated to genetic variation.

그 후, 각 질병에 해당하는 증상 정보를 이용하여 환자의 증상과 유사도를 제1 증상 유사도 측정기에 의해 총 7,137개의 유사도 값을 생성한다. 이 값의 평균과 표준 편차를 이용하여 각 환자에 대한 7,137개의 유사도 값의 분포를 보정한다. 즉, 유사도 값에서 평균을 뺀 후 표준 편차로 나누어 보정 유사도 값을 생성한다. 이때, 유사도 값은 유전 변이 연관 질병 그룹에서의 제1 유사도와 유전 변이 무관 질병 그룹에서의 제2 유사도를 포함하고, 보정 유사도 값은 유전 변이 연관 질병 그룹에서의 제1 보정 유사도와 유전 변이 무관 질병 그룹에서의 제2 보정 유사도를 포함한다.Subsequently, a total of 7,137 similarity values are generated by using the first symptom similarity measurer for the patient's symptoms and similarity using symptom information corresponding to each disease. The mean and standard deviation of these values are used to correct the distribution of 7,137 similarity values for each patient. That is, a correction similarity value is generated by subtracting the average from the similarity value and dividing by the standard deviation. In this case, the similarity value includes the first similarity in the genetic variation-associated disease group and the second similarity in the genetic variation-related disease group, and the corrected similarity value is the first correction similarity and the genetic variation-independent disease in the genetic variation-associated disease group. Includes second correction similarity in the group.

그 후, 전체 환자에 대하여 각각 이러한 보정 작업을 수행한다. 그 후, 전체 환자에 대하여, 각 환자의 유전 변이와 연관된 질병에 대한 제1 보정 유사도 값들을 모아 제1 전체 보정 유사도 값을 생성하고, 각 환자의 유전 변이와 무관한 질병에 대한 제2 보정 유사도 값들을 모아 제2 전체 보정 유사도 값을 생성한다. 그 후 두 그룹의 유사도 값들의 분포를 비교한다.Then, each of these corrections is performed for the entire patient. Thereafter, for all patients, the first correction similarity values for diseases associated with each patient's genetic variation are collected to generate a first overall correction similarity value, and the second correction similarity for a disease independent of each patient's genetic variation. The values are collected to generate a second overall correction similarity value. Then, the distributions of similarity values of the two groups are compared.

제2 증상 유사도 측정기에 대해서도 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템을 통하여 동일한 절차를 수행한다.For the second symptom similarity meter, the same procedure is performed through the performance evaluation system for the symptom similarity meter according to an embodiment of the present invention.

그 결과, 제1 증상 유사도 측정기에 의한 제1 전체 보정 유사도의 평균은 0.120605이고, 제2 전체 보정 유사도의 평균은 -0.000301522이다. 제1 증상 유사도 측정기에 의해 제1 전체 보정 유사도의 평균과 제2 전체 보정 유사도의 평균의 차이에 대한 통계적 유의도는 이분산 student's t-test를 수행하였을 경우 p-값이 0.0000000133842 으로 유의 수준 0.05 에서 이 두 분포의 평균은 차이가 남을 알 수 있고, 이 때 제1 전체 보정 유사도 평균이 제2 전체 보정 유사도 평균보다 큰 것을 알 수 있다.As a result, the average of the first overall correction similarity by the first symptom similarity meter is 0.120605, and the average of the second overall correction similarity is -0.000301522. Statistical significance for the difference between the mean of the first overall corrected similarity and the mean of the second overall corrected similarity by the first symptom similarity meter is a p-value of 0.0000000133842 when the heteroscedastic student's t-test is performed, at a significance level of 0.05. The difference between the mean of the two distributions can be seen, and it can be seen that the first overall correction similarity average is greater than the second overall correction similarity average.

제2 증상 유사도 측정기에 의한 제1 전체 보정 유사도의 평균은 0.112785이고, 제2 전체 보정 유사도의 평균은 0.00028197이다. 제2 증상 유사도 측정기에 의해 제1 전체 보정 유사도의 평균과 제2 전체 보정 유사도의 평균의 차이에 대한 통계적 유의도는 0.0048305이다.The average of the first overall correction similarity by the second symptom similarity meter is 0.112785, and the average of the second overall correction similarity is 0.00028197. The statistical significance for the difference between the mean of the first overall corrected similarity and the mean of the second overall corrected similarity by the second symptom similarity meter is 0.0048305.

이와 같이 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 시스템을 통하여 제1 증상 유사도 측정기의 성능이 우수하고, 제2 증상 유사도의 성능이 제1 증상 유사도 측정기의 성능보다 좋지 않다는 것을 알 수 있다.Thus, through the performance evaluation system for the symptom similarity meter according to an embodiment of the present invention, it is understood that the performance of the first symptom similarity meter is excellent and the performance of the second symptom similarity meter is not better than the performance of the first symptom similarity meter. Can be.

이하에서는, 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법을 설명한다. 이하에서는 설명의 편의를 위해 전술의 도 1 및 도 2에서 언급한 참조번호를 언급하여 설명하고, 앞서 설명한 내용과 중복된 내용은 생략한다. Hereinafter, a method for evaluating the performance of a symptom similarity meter according to an embodiment of the present invention will be described. Hereinafter, for convenience of description, reference numerals mentioned in FIG. 1 and FIG. 2 will be described and described, and overlapped contents with the above-described contents will be omitted.

도 4는 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법의 순서도이고, 도 5는 본 발명에 따른 유사도 산출부의 순서도다.4 is a flowchart of a performance evaluation method for a symptom similarity measurer according to an embodiment of the present invention, and FIG. 5 is a flowchart of a similarity calculation unit according to the present invention.

도 4 및 도 5를 참조하면, 본 발명의 일 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법은 유사도 산출부(100)에서 제1 유사도 및 제2 유사도를 포함하는 유사도를 산출할 수 있다(S100).4 and 5, in the performance evaluation method for the symptom similarity measurer according to an embodiment of the present invention, the similarity calculation unit 100 may calculate the similarity including the first similarity and the second similarity ( S100).

유사도 산출(S100)은 선택부(110)에서 성능 평가 대상인 증상 유사도 측정기를 선택하고(S110), 수취부(130)에서 데이터베이스로(500)부터 환자가 가지고 있는 유전 변이 정보를 취득한다(S120). 그 후 질병 정보 탐색부(150)에서 유전 변이에 의해 유발되는 유전 변이 연관 질병에 대한 정보와, 유전 변이와 무관한 유전 변이 무관 질병에 대한 정보를 데이터베이스(500)로부터 탐색한다(S130). 그 후 증상 정보 탐색부(170)에서 유전 변이 연관 질병에 대해 알려진 증상과 유전 변이 무관 질병에 대해 알려진 증상에 대한 정보를 취득한다(S140). 그 후, 계산부(190)에서 증상 유사도 측정기에 의해 환자 증상 집합과 유전 변이 연관 질병 증상 집합 간의 제1 유사도를 측정하고, 증상 유사도 측정기에 의해 유전 변이 무관 질병 증상 집합 간의 제2 유사도를 측정한다(S150).The similarity calculation (S100) selects a symptom similarity measurer that is a performance evaluation target from the selection unit 110 (S110), and acquires genetic variation information of the patient from the receiving unit 130 to the database 500 (S120). . Thereafter, the disease information search unit 150 searches for information on diseases related to genetic variation caused by genetic variation and information on diseases related to genetic variation unrelated to genetic variation from the database 500 (S130). Subsequently, the symptom information search unit 170 acquires information about a symptom known to a genetic variation-related disease and a symptom known to a genetic variation-related disease (S140). Thereafter, the calculation unit 190 measures the first similarity between the patient symptom set and the disease symptom set associated with genetic variation by the symptom similarity meter, and measures the second similarity between the disease symptom set irrelevant to the genetic variation by the symptom similarity meter ( S150).

다음으로, 유사도 보정부(200)에서 상기 제1 유사도를 보정하여 제1 보정 유사도를 생성하고, 상기 제2 유사도를 보정하여 제2 보정 유사도를 생성한다(S200).Next, the similarity correcting unit 200 corrects the first similarity to generate a first corrected similarity, and corrects the second similarity to generate a second corrected similarity (S200).

다음으로, 전체 유사도 보정부(300)에서, 전체 환자에 대하여 상기 제1 보정 유사도를 합한 제1 전체 보정 유사도와 전체 환자에 대하여 상기 제2 보정 유사도를 합한 제2 전체 보정 유사도를 생성한다(S300).Next, the overall similarity correction unit 300 generates a first overall correction similarity sum of the first correction similarities for all patients and a second overall correction similarity sum of the second correction similarities for all patients (S300). ).

다음으로, 검증부(400)에서, 상기 제1 전체 보정 유사도와 상기 제2 전체 보정 유사도를 서로 비교하여 성능을 검증한다(S400).Next, the verification unit 400 verifies the performance by comparing the first overall correction similarity and the second overall correction similarity (S400).

이와 같이, 본 발명의 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법은 환자가 가지고 있는 유전 변이와 질병에 대한 연관성을 이용함으로써 환자 증상 집합과 질병 증상 집합 간의 유사도를 평가하는 증상 유사도 측정기의 성능을 검증할 수 있다.As described above, the performance evaluation method for the symptom similarity measurer according to the embodiment of the present invention uses the correlation between the genetic variation and the disease possessed by the patient to improve the performance of the symptom similarity measurer to evaluate the similarity between the patient symptom set and the disease symptom set. Can be verified.

또한, 본 발명의 실시예에 따른 증상 유사도 측정기에 대한 성능 평가 방법은 증상 유사도 측정기에 의한 유사도를 환자 전체에 대해서 보정하여 검증함으로써 검증에 대한 신뢰성을 향상시킬 수 있다In addition, the performance evaluation method for the symptom similarity meter according to the embodiment of the present invention can improve the reliability of verification by correcting and verifying the similarity by the symptom similarity meter for the entire patient.

이상에서 설명한 본 발명이 전술한 실시예 및 첨부된 도면에 한정되지 않으며, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지로 치환, 변형 및 변경이 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications and changes are possible within the scope of the present invention without departing from the technical spirit of the present invention. It will be obvious to those of ordinary skill.

100: 유사도 산출부 110: 선택부
130: 수취부 150: 질병 정보 탐색부
170: 증상 정보 탐색부 190: 계산부
200: 유사도 보정부 300: 전체 유사도 보정부
400: 검증부
1000: 증상 유사도 측정기에 대한 성능 평가 시스템100: similarity calculation unit 110: selection unit
130: recipient 150: disease information search unit
170: symptom information search unit 190: calculation unit
200: similarity correction unit 300: overall similarity correction unit
400: verification unit
1000: Performance evaluation system for symptom similarity meter

Claims

In the performance evaluation system for any symptom similarity measurer that measures the similarity between a patient's symptom set and a set of known disease symptoms for a disease,
A similarity calculator configured to calculate a first similarity between a patient symptom set and a disease symptom set associated with genetic variation, and a second similarity between a patient symptom set and a disease symptom set independent of genetic variation, by the symptom similarity meter;
A similarity correction unit that corrects the first similarity to generate a first correction similarity and corrects the second similarity to generate a second correction similarity;
An overall similarity correction unit generating a first overall correction similarity summing the first correction similarities for all patients and a second overall correction similarity summing the second correction similarities for all patients; And
A performance evaluation system for a symptom similarity measurer including a verification unit that compares the first overall correction similarity and the second overall correction similarity to verify performance.

According to claim 1,
The similarity calculation unit,
A selection unit for selecting a symptom similarity measurer to be evaluated for performance;
A receiving unit that acquires genetic variation information of the patient;
A disease information search unit configured to search for information on diseases related to genetic variation caused by the genetic variation and information on diseases related to the genetic variation unrelated to the genetic variation;
A symptom information search unit for searching for information on symptoms known to the genetic variation-related disease and symptoms known to the disease related to the genetic variation; And
A symptom similarity measurer comprising a calculator configured to measure a first similarity between a patient symptom set and a set of disease symptoms associated with the genetic variation by the symptom similarity measurer, and to measure a second similarity between a set of disease symptoms unrelated to the genetic variation by the symptom similarity measurer. Performance evaluation system for.

According to claim 1,
The similarity correcting unit is a performance evaluation system for a symptom similarity measurer that generates the first correction similarity by Equation 1 below and generates the second correction similarity by Equation 2.
Equation 1
(Above first similarity-ave) / stdev
Equation 2
(Above second similarity-ave) / stdev)
(ave is the average value of the first similarity and the second similarity overall, and stdev is the standard deviation of the first similarity and the second similarity overall.)

According to claim 3,
The verification unit compares a representative value of the first overall corrected similarity distribution with a representative value of the second overall corrected similarity distribution, and a performance evaluation system for a symptom similarity measurer.

The method of claim 4,
The representative value is an average or median (median) performance evaluation system for a symptom similarity meter.

The symptom similarity measuring apparatus of claim 4, wherein the verification unit evaluates that the higher the difference between the representative value of the first overall corrected similarity distribution and the representative value of the second overall corrected similarity distribution, the better the performance is. Performance evaluation system.

In the similarity calculator, a first similarity between a patient symptom set and a set of disease symptoms associated with genetic variation, and a miscarriage diagram for calculating a second similarity between a set of patient symptoms and a set of disease symptoms independent of genetic variation for each patient Calculation step;
A similarity correcting step in the similarity correcting unit, correcting the first similarity to generate a first corrected similarity, and correcting the second similarity to generate a second corrected similarity;
An overall similarity correction step of generating, in the overall similarity correction unit, a first overall correction similarity summing the first correction similarities for all patients and a second overall correction similarity summing the second correction similarities for all patients; And
And a verification step of verifying performance by comparing the first overall correction similarity and the second overall correction similarity in the verification unit.

The method of claim 7,
The similarity measurement step,
In the selection unit, selecting a symptom similarity measurer to be evaluated for performance;
In the receiving unit, obtaining the genetic variation information that the patient has;
In the disease information search unit, searching for information on a disease related to a genetic variation caused by the genetic variation and information on a disease related to a genetic variation unrelated to the genetic variation;
In the symptom information search unit, acquiring information about a symptom known to the genetic variation-related disease and a symptom known to the genetic variation-related disease; And
In the calculation unit, measuring the first similarity between the patient symptom set and the disease symptom set associated with the genetic variation by the symptom similarity meter, and measuring the second similarity between the disease symptom set irrelevant to the genetic variation by the symptom similarity meter. Performance evaluation method for symptom similarity meter.

The method of claim 7,
The similarity correction step,
Through the similarity correction unit, the first correction similarity is generated by Equation 1 below, and the second correction similarity is generated by Equation 2 below.
Equation 1
(Above first similarity-ave) / stdev
Equation 2
(Above second similarity-ave) / stdev)
(ave is the average value of the first similarity and the second similarity overall, and stdev is the standard deviation of the first similarity and the second similarity overall.)

The method of claim 9,
The verification step,
A performance evaluation method for a symptom similarity meter, characterized in that the representative value of the first overall corrected similarity distribution is compared with the representative value of the second overall corrected similarity distribution through the verification unit.

The method of claim 10,
The representative value is an average or median (median) performance evaluation method for a symptom similarity meter.

The method of claim 10,
Through the verification unit, the greater the difference between the representative value of the first overall corrected similarity distribution and the representative value of the second overall corrected similarity distribution, the better the performance evaluation method for a symptom similarity measurer.