KR101141103B1

KR101141103B1 - Method of generating decision rule for clinical diagnosis

Info

Publication number: KR101141103B1
Application number: KR1020090124918A
Authority: KR
Inventors: 손창식; 신아미; 이영동; 박형섭; 박희준; 김윤년
Original assignee: 계명대학교 산학협력단
Priority date: 2009-12-15
Filing date: 2009-12-15
Publication date: 2012-05-02
Also published as: KR20110068083A

Abstract

본 발명에 따르는 임상진단 결정 규칙 생성 방법은, 다수의 환자 각각에 대한 다수의 검사 항목의 검사 데이터를 입력받아 검사 항목별로 분류하여 값의 크기에 따라 정렬하며, 각 검사 데이터에 임상진단 결과를 표식하는 단계; 각 검사 항목별로 적합도 및 임상진단 결정을 위한 컷 오프 값을 산출하는 단계; 상기 다수의 검사 항목 각각에 대해 컷 오프 값을 기준으로 임상진단 결정 규칙을 생성하는 단계; 상기 임상진단 결정규칙에 대한 임상진단 발생빈도를 검출하는 단계; 임상진단 발생빈도에 따라 임상진단 결정규칙 중 일부를 최종 임상진단 결정규칙으로 결정하는 단계;를 구비하며, 상기 적합도는 해당 검사 항목이 상기 임상진단 결정을 위해 어느 정도 적합한지를 나타내며, 검사항목별로 임상진단 결과가 중첩되는 영역을 검출하고, 상기 중첩되는 영역에 포함되는 다수의 검사 데이터 각각에 대해 임상진단 결과별로 분리될 수 있는 정도를 나타내는 분리가능정도를 산출하고, 상기 분리가능정도의 합을 산출하여 획득되고, 상기 컷 오프 값은 상기 중첩되는 영역내에서의 검사 데이터의 값의 가중치 평균값임을 특징으로 한다.In the method for generating a clinical diagnosis rule according to the present invention, the test data of a plurality of test items for each of a plurality of patients is input, classified by test items, and sorted according to the size of the value, and the clinical diagnosis results are marked on each test data. Making; Calculating cut-off values for determining fitness and clinical diagnosis for each test item; Generating a clinical diagnosis rule based on a cut-off value for each of the plurality of test items; Detecting the incidence of clinical diagnosis for the clinical diagnosis decision rule; And determining a part of the clinical diagnosis decision rules as the final clinical diagnosis decision rule according to the frequency of clinical diagnosis. The fitness indicates the degree to which the test item is suitable for the clinical diagnosis decision. Detecting the overlapping region of diagnostic results, calculating the separability indicating the degree of separation for each clinical diagnosis result for each of the plurality of test data included in the overlapping region, and calculating the sum of the separability. And the cut-off value is a weighted average value of values of inspection data in the overlapping area.

의료 데이터, 임상진단 결정 Medical data, clinical diagnosis decision

Description

Method of generating decision rule for clinical diagnosis

본 발명은 의료 데이터 처리기술에 관한 것으로, 더욱 상세하게는 환자에 대한 임상진단 결정을 위해 어느 검사 항목에 대해 검사하고, 그 검사에 따른 검사 데이터에 따라 임상진단을 결정하기 위한 임상진단 결정 규칙을 생성하는 방법에 관한 것이다. The present invention relates to medical data processing technology, and more specifically, to examine a certain test item for clinical diagnosis determination for a patient, and to determine a clinical diagnosis decision rule for determining a clinical diagnosis according to the test data according to the test. To generate.

호흡 곤란은 환자의 주관적인 증상으로 빈호흡, 기좌호흡, 체인스톡(cheynestokes) 호흡, kussmaul 호흡의 형태로 관찰 가능하고, 응급실에서 볼 수 있는 가장 흔한 주호소(chief complaint) 중 하나이다. Difficulty breathing is a subjective symptom of the patient and can be observed in the form of empty breathing, breathing breathing, cheynestokes breathing, or kussmaul breathing, and is one of the most common chief complaints seen in an emergency room.

응급실에 호흡곤란을 주호소로 내원한 환자는 크게 심인성 질환과 폐인성 질환으로 구분할 수 있는데, 심인성 질환은 좌심실 부전, 폐부종, 울혈성 심주전 등이 주요 원인이며, 폐인성 질환은 만성 폐쇄성 폐질환, 폐렴, 폐암 등이 주요 원인이다. Patients who visited the respiratory distress as the main station in the emergency room can be classified into cardiac disease and pulmonary disease. , Pneumonia and lung cancer are the main causes.

이러한 호흡곤란의 원인질환은 짧은 시간의 문진으로 진단을 하기가 어렵기 때문에 임상전문가들은 피검사나 흉부 방사선 검사 등을 이용하여 진단을 하고 있 으며, 검사된 항목들의 결과로부터 중요특징을 분석하고 감별하는데 많은 시간을 투자하고 있다.Since the causes of respiratory distress are difficult to diagnose in a short time, a clinical expert diagnoses by using a blood test or chest radiograph, and analyzes and discriminates important features from the results of the examined items. I invest a lot of time.

일반적인 특징선택방법은 주어진 데이터로부터 관련된 특징들의 하위집합을 선택하거나, 새로운 특징들을 결합하는 방법으로 고차원의 문제를 저차원으로 변화하여 처리한다. 또한 특징 수의 증가에 따른 계산 복잡도(computational complexity)나 차원의 저주(curse of dimensionality)를 효과적으로 해결할 수 있다는 장점 때문에 패턴분류나 의사결정 등과 같은 문제에 전처리과정으로 사용되고 있으며, 특히 신경망과 유전자 알고리즘을 이용한 방법이 주목할만한 성능을 제공하고 있다. In general, the feature selection method selects a subset of related features from a given data or combines new features to deal with high-level problems by changing them to low dimensions. It is also used as a preprocessing process for problems such as pattern classification and decision making, because it can effectively solve computational complexity or curse of dimensionality as the number of features increases. The method used offers remarkable performance.

하지만 신경망 및 유전자 알고리즘에 기반한 방법들은 관련 특징들을 선택하는데 있어서 여러 학습 매개변수(즉 신경망의 경우, 학습률, 모멘텀, 연결가중치 가지치기 수준, 유전자알고리즘의 경우 목적함수에서 사용된 매개변수 등)들을 조정해야 하고, 선택된 특징들간의 관계에 대한 해석(interpretability)이 어렵거나 불가능하다는 제약점을 가진다. However, methods based on neural networks and genetic algorithms adjust several learning parameters (ie, learning rate, momentum, link weight pruning level for neural networks, parameters used in objective function for genetic algorithms, etc.) in selecting relevant features. And has the limitation that the interpretation of the relationships between the selected features is difficult or impossible.

그러므로 이들 기계 학습 알고리즘은 다변수 혹은 고차원으로 이루어진 임상 데이터에서 중요 특징을 선택하기 위한 도구로는 적합하지 않으며, 선택된 특징들과 컷오프(cut-off) 값이 임상기준과 비교하였을 때의 신뢰성 여부가 추가적으로 평가되어야 했다. Therefore, these machine learning algorithms are not suitable as a tool for selecting important features in multivariate or high-level clinical data, and the reliability of the selected features and cut-off values compared with the clinical criteria is not appropriate. It had to be further evaluated.

본 발명은 환자에 대한 임상진단 결정을 위해 어느 검사 항목에 대해 검사하고, 그 검사에 따른 검사 데이터에 따라 임상진단을 결정하기 위한 임상진단 결정 규칙을 생성하는 방법을 제공하는 것을 그 목적으로 한다. It is an object of the present invention to provide a method for generating a clinical diagnostic decision rule for examining a test item for a clinical diagnosis decision for a patient, and for determining the clinical diagnosis according to the test data according to the test.

상기한 목적을 달성하기 위한 본 발명에 따르는 임상진단 결정 규칙 생성 방법은, 다수의 환자 각각에 대한 다수의 검사 항목의 검사 데이터를 입력받아 검사 항목별로 분류하여 값의 크기에 따라 정렬하며, 각 검사 데이터에 임상진단 결과를 표식하는 단계; 각 검사 항목별로 적합도 및 임상진단 결정을 위한 컷 오프 값을 산출하는 단계; 상기 다수의 검사 항목 각각에 대해 컷 오프 값을 기준으로 임상진단 결정 규칙을 생성하는 단계; 상기 임상진단 결정규칙에 대한 임상진단 발생빈도를 검출하는 단계; 임상진단 발생빈도에 따라 임상진단 결정규칙 중 일부를 최종 임상진단 결정규칙으로 결정하는 단계;를 구비하며, 상기 적합도는 해당 검사 항목이 상기 임상진단 결정을 위해 어느 정도 적합한지를 나타내며, 검사항목별로 임상진단 결과가 중첩되는 영역을 검출하고, 상기 중첩되는 영역에 포함되는 다수의 검사 데이터 각각에 대해 임상진단 결과별로 분리될 수 있는 정도를 나타내는 분리가능정도를 산출하고, 상기 분리가능정도의 합을 산출하여 획득되고, 상기 컷 오프 값은 상기 중첩되는 영역내에서의 검사 데이터의 값의 가중치 평균값임을 특징으로 한다.In order to achieve the above object, the method for generating a clinical diagnosis rule according to the present invention includes receiving test data of a plurality of test items for each of a plurality of patients, sorting by test item, and sorting according to the size of each test item. Marking clinical diagnostic results in the data; Calculating cut-off values for determining fitness and clinical diagnosis for each test item; Generating a clinical diagnosis rule based on a cut-off value for each of the plurality of test items; Detecting the incidence of clinical diagnosis for the clinical diagnosis decision rule; And determining a part of the clinical diagnosis decision rules as the final clinical diagnosis decision rule according to the frequency of clinical diagnosis. The fitness indicates the degree to which the test item is suitable for the clinical diagnosis decision. Detecting the overlapping region of diagnostic results, calculating the separability indicating the degree of separation for each clinical diagnosis result for each of the plurality of test data included in the overlapping region, and calculating the sum of the separability. And the cut-off value is a weighted average value of values of inspection data in the overlapping area.

상기한 본 발명은 환자에 대한 임상진단 결정을 위해 어느 검사 항목에 대해 검사하고, 그 검사에 따른 검사 데이터에 따라 임상진단을 결정하기 위한 임상진단 결정 규칙을 생성함에 있어 신뢰 높은 결과를 얻을 수 있게 하는 효과가 있다. According to the present invention, it is possible to obtain reliable results in generating a clinical diagnosis decision rule for examining a test item for determining a clinical diagnosis for a patient and determining the clinical diagnosis according to the test data according to the test. It is effective.

<연구대상 및 자료수집><Objects and Data Collection>

본 발명은 D 광역시에 소재한 D 의료원에 2006 년 7 월에서 2007 년 6 월 사이에 호흡곤란을 주호소로 응급실에 내원한 환자 1,129 명의 의무기록을 대상으로 하였다. 대상자의 인적 사항을 제외한 등록번호, 성별, 나이, 응급실 내원일자 및 시간, 진료결과, 입원 시 진단, 초기 검사 항목 등의 자료를 데이터웨어 하우스에서 추출하였다. The subjects of this study were the medical records of 1,129 patients who visited the D medical center in D metropolitan city between July 2006 and June 2007 as the main shelter for respiratory distress. Excluding the subject's personal information, data such as registration number, sex, age, date and time of emergency visit, medical result, hospitalization diagnosis, and initial test items were extracted from the data warehouse.

초기 검사 항목으로는 전혈구 검사(common blood cell & differential count, CBC & diff. count), 프로트롬빈 시간(prothrombin time, PT), 활성화 부분 트롬보플라스틴 시간(activated partial thromboplastin time, aPTT), 혈청 전해질(serum electrolytes), 입원환자에 대한 기본 검사(routine admission), 혈청 아밀라제, 동맥혈 가스 분석(blood pH and gas), 리파아제, CK-MB, Troponin I, CK, LDH(lactate dehydrogenase), CRP, Fibrinogen, Ca² ⁺(calcium), Mg² ⁺(magnesium), Pro-BNP 가 있었다. Initial tests included common blood cell & differential count (CBC & diff.count), prothrombin time (PT), activated partial thromboplastin time (aPTT), serum electrolyte (serum electrolytes), routine admission for patients, serum amylase, blood pH and gas, lipase, CK-MB, Troponin I, CK, lactate dehydrogenase (CRP), fibrinogen, ^{^{Ca 2 + (calcium), Mg}} 2 + (magnesium), there is a Pro-BNP.

상기 수집된 자료 중 타병원으로 전원된 환자, DOA(death on arrival), CPR(Cardio-Pulmonary Resuscitation) 후 혹은 DNR(Do Not Resuscitate)로 사망한 환자, 자의 퇴원 혹은 미상의 기타 환자, 의무기록이 불완전한 경우를 제외한 총 668 명의 환자(입원환자 500 명, 퇴원환자 168 명)에 대한 데이터를 분석하였다.Among the collected data, patients who were transferred to another hospital, patients who died after DOA (death on arrival), CPR (Cardio-Pulmonary Resuscitation) or DNR (Do Not Resuscitate), their discharge or unknown other patients, medical records Data from a total of 668 patients (500 inpatients and 168 outpatients) were analyzed except for incomplete cases.

<데이터의 공간적인 분포를 이용한 진단 규칙 생성방법><How to create a diagnostic rule using spatial distribution of data>

본 발명의 바람직한 실시예에 따른 임상진단 결정 규칙 생성방법을 도 1의 흐름도를 참조하여 설명한다. 상기한 본 발명에 따른 임상진단 결정 규칙 생성방법에 따른 처리 프로세스는 컴퓨팅 시스템에 의해 동작되도록 구현된다. 여기서, 상기 컴퓨팅 시스템은 일반적인 서버-클라이언트 시스템으로 구성된다. 예를 들면, 특정 질병 환자(가령, 호흡곤란이 주호소인 환자)에 대한 임상진단 결정 규칙 생성을 요청하는 복수의 사용자 단말기와, 상기 복수의 사용자 단말기와 네트워크로 연결되는 임상진단 결정서버 및 특정 질병(가령, 호흡곤란이 주호소인 환자) 각 환자에 대한 검사 데이터를 저장하는 임상진단 데이터베이스(DB)로 구성될 수 있다(도면 구성 생략). 이에 따라 사용자가 상기 복수의 단말기 중 어느 하나를 통해 호흡곤란에 대한 임상진단 결정 규칙 생성을 상기 임상진단 결정서버에 요청하면, 상기 임상진단 결정서버는 상기 임상진단 데이터베이스로부터 검사 데이터를 수신받아 다음과 같이 임상진단 결정규칙 생성 과정을 수행한다. 임상진단 데이터베이스(DB)에는 환자의 인적사항, 등록번호, 성별, 나이, 응급실 내원일자 및 시간, 진료결과, 입원 시 진단, 초기 검사 항목 등의 검사데이터 및 관련 사항들이 저장되며, 전술한 연구대상에서는 호흡곤란을 주호소로 하는 환자들의 검사데이터를 예를 들었다. A method for generating a clinical diagnosis decision rule according to a preferred embodiment of the present invention will be described with reference to the flowchart of FIG. 1. The processing process according to the method for generating a clinical diagnosis decision rule according to the present invention is implemented to be operated by a computing system. Here, the computing system is composed of a general server-client system. For example, a plurality of user terminals requesting the generation of a clinical diagnosis decision rule for a patient with a specific disease (eg, a patient having difficulty in breathing), a clinical diagnosis decision server connected to a network of the plurality of user terminals, and a specific one. It may consist of a clinical diagnostic database (DB) that stores test data for each patient of the disease (eg, patients with difficulty breathing). Accordingly, when the user requests the clinical diagnosis decision server to generate a clinical diagnosis decision rule for respiratory distress through any one of the plurality of terminals, the clinical diagnosis decision server receives test data from the clinical diagnosis database as follows. Similarly, the clinical diagnosis rule generation process is performed. The clinical diagnosis database (DB) stores the patient's personal information, registration number, gender, age, the date and time of the emergency room visit, the results of the examination, the diagnosis at the time of hospitalization, the initial examination items and related items. For example, test data from patients with dyspnea were the mainstay.

상기 임상진단 결정 서버는 사용자의 임상진단 결정 규칙 생성에 대한 요청이 있으면, 상기 임상진단 데이터베이스로부터 환자의 검사 데이터를 읽어 들인다. 상기 임상진단 결정 서버가 수신받는 입력 데이터는 다수의 환자의 검사 데이터로 구성되며, 각 환자에 대한 검사 데이터는 다수의 검사 항목으로 구성되며, 각 환자에 대한 임상진단 결과는 결정된 상태이다.
상기 임상진단 결정서버는 수신받은 다수의 환자의 검사 데이터를 검사 항목별로 분류하여 정렬하고, 각 검사 데이터에 임상진단 결과를 표식한다(100단계). 이를 예시한 것이 도 2이다. The clinical diagnosis decision server reads the patient's examination data from the clinical diagnosis database when there is a request for generating the clinical diagnosis decision rule of the user. The input data received by the clinical diagnosis determination server is composed of test data of a plurality of patients, the test data of each patient is composed of a plurality of test items, and the clinical diagnosis result of each patient is determined.
The clinical diagnosis determination server classifies and sorts the received test data of a plurality of patients by test items, and marks the result of clinical diagnosis on each test data (step 100). 2 illustrates this.

삭제delete

도 2에 예시한 그래프는 다수의 환자에 대한 백혈구 수치를 나타낸 것으로 원으로 표식된 데이터는 퇴원한 환자의 백혈구 수를 나타내고, 사각형으로 표식된 데이터는 입원한 환자의 백혈구 수를 나타낸다. The graph illustrated in FIG. 2 shows leukocyte counts for a number of patients, with circled data representing the leukocyte count of the discharged patient and squared data representing the leukocyte count of the hospitalized patient.

이러한 방식으로 다수의 환자 각각에 대한 다수의 검사 항목의 검사 데이터를 입력받아, 검사 항목별로 분류하여 값의 크기에 따라 정렬하고, 각 검사 데이터에 임상진단인 입원 또는 퇴원을 표식한다. 여기서, 상기 입원 또는 퇴원은 환자의 검사 데이터를 입력하였을 때의 임상진단 결과로서, 상기 임상진단은 입력 데이터 종류와 목적에 따라 변경될 수 있음은 본 발명으로부터 자명하다. In this way, the test data of a plurality of test items for each of the plurality of patients is input, sorted by the test item, and sorted according to the size of the value, and each test data is marked for hospitalization or discharge of clinical diagnosis. Here, the hospitalization or discharge is a clinical diagnosis result when the test data of the patient is input, it is apparent from the present invention that the clinical diagnosis can be changed according to the type and purpose of the input data.

상기한 바와 같이 다수의 환자 각각에 대한 다수의 검사 항목의 검사 데이터를 입력받아 검사 항목별로 분류하여 값의 크기에 따라 정렬하여 각 검사 데이터에 임상진단 결과를 표식하면, 상기 임상진단 결과가 중첩되는 영역이 존재한다. 즉, 도 2의 예에서 A 영역은 퇴원 판단된 환자와 입원 판단된 환자의 검사 데이터가 중첩된다. As described above, when the test data of a plurality of test items for each of the plurality of patients is input and classified according to the test items, sorted according to the size of the value, and the clinical diagnosis results are marked on each test data, the clinical diagnosis results overlap. There is a realm. That is, in the example of FIG. 2, the examination data of the hospitalized patient and the hospitalized patient is overlapped in the area A. FIG.

상기 중첩되는 영역을 토대로 상기 임상진단 결정서버는 상기 임상진단 결정을 위한 적합도를 평가한다. 즉, 중첩 영역에 포함되는 다수의 검사 데이터 각각에 대해 임상진단 결과별로 분리될 수 있는 정도를 나타내는 분리가능정도를 산출하고, 상기 분리가능정도의 합을 산출하여 해당 검사 데이터에 따른 검사 항목이 상기 임상진단 결정을 위해 어느 정도 적합한지를 나타내는 적합도를 결정한다. Based on the overlapping areas, the clinical diagnosis decision server evaluates the suitability for the clinical diagnosis decision. That is, for each test data included in the overlapping area, a separability degree indicating the degree of separation that can be separated for each clinical diagnosis result is calculated, and the sum of the separability degrees is calculated to determine the test item according to the test data. Determine fitness indicating how well suited to the clinical diagnosis decision.

이러한 과정을 통해 모든 검사 항목에 대해 임상진단 결정에 대한 적합도를 산출하고, 그 산출된 적합도를 토대로 해당 임상진단 결정을 위한 검사 항목 순위를 결정할 수 있다. Through this process, it is possible to calculate the goodness of fit of the clinical diagnosis for all the test items, and determine the rank of the test items for the corresponding clinical diagnosis based on the calculated goodness of fit.

상기 검사 항목별 적합도가 결정되면, 상기 임상진단 결정서버는 상기 검사 항목별 임상진단 결과를 결정하기 위한 컷 오프 값을 산출한다(102단계). 상기 컷 오프 값은 중첩 영역내에서 검사 데이터의 값의 가중치 평균값이다. When the fitness for each test item is determined, the clinical diagnosis determination server calculates a cutoff value for determining the clinical diagnosis result for each test item (step 102). The cut off value is a weighted average value of the values of the inspection data in the overlap region.

상기한 바와 같이 검사 항목별 적합도와 검사 항목별 컷 오프 값이 결정되면, 상기 검사 항목들을 통한 임상진단 결정을 위한 규칙패턴을 생성할 수 있다. As described above, when the fitness for each test item and the cut-off value for each test item are determined, a rule pattern for determining a clinical diagnosis through the test items may be generated.

즉, 상기 임상진단 결정서버는 상기 다수의 검사 항목에 대한 임상 의사 결정을 위한 규칙 패턴을 생성한 후에(104단계), 상기 다수의 검사 항목에 대한 임상 의사 결정을 위한 규칙 패턴에 대한 임상진단 결정의 발생 빈도를 검출하고(106단계), 상기 발생 빈도에 따라 상기 다수의 검사 항목에 대한 임상 의사 결정을 위한 규칙 패턴 중 일부를 선택하여 최종 규칙 패턴을 결정한다(108단계). That is, after the clinical diagnosis decision server generates a rule pattern for clinical decision making for the plurality of test items (step 104), the clinical diagnosis decision for the rule pattern for clinical decision making for the plurality of test items is performed. The frequency of occurrence of C is detected (step 106), and a final rule pattern is determined by selecting some of the rule patterns for clinical decision making for the plurality of test items according to the frequency of occurrence (step 108).

상기 최종 규칙 패턴 생성과정을 백혈구 수와 혈소판 수에 대한 규칙공간을 도시한 도 3을 참조하여 예를 들어 설명한다. The final rule pattern generation process will be described by way of example with reference to FIG. 3, which shows a rule space for white blood cell count and platelet count.

상기 도 3의 규칙 공간 중 수평 공간은 다수의 환자의 검사 데이터 중 백혈구 수(WBC) 항목의 검사 데이터를 그 수에 따라 정렬하여 표식함과 아울러 각 검사 데이터에 임상진단인 퇴원 또는 입원(DISCHARGE OR ADMISSION)를 표식한 것이고, 수직 공간은 다수의 환자의 검사 데이터 중 혈소판 수(PLT) 항목의 검사 데이터를 그 수에 따라 정렬하여 표식함과 아울러 각 검사 데이터에 임상진단인 퇴원 또는 입원(DISCHARGE OR ADMISSION)을 표식한 것이고, 적색 점선은 컷 오프 값을 의미한다. The horizontal space in the rule space of FIG. 3 is marked by sorting the test data of the WBC item among the test data of a plurality of patients according to the number and discharging or hospitalizing a clinical diagnosis to each test data (DISCHARGE OR). ADMISSION), and the vertical space indicates the number of platelet count (PLT) items in the test data of a large number of patients according to the number and marks the data. ADMISSION), and the dashed red line means the cut off value.

상기 백혈구 수와 상기 혈소판 수 항목에 대한 임상 의사 결정을 위한 규칙 패턴을 생성하면 다음과 같다. If a rule pattern for clinical decision making for the white blood cell count and the platelet count item is generated as follows.

1-1) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Admission 1-1) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Admission

1-2) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Discharge 1-2) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Discharge

2-1) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Admission 2-1) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Admission

2-2) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Discharge 2-2) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Discharge

3-1) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Admission 3-1) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Admission

3-2) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Discharge 3-2) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Discharge

4-1) Rule 4: WBC is Admission and PLT is Admission The Patient is Admission 4-1) Rule 4: WBC is Admission and PLT is Admission The Patient is Admission

4-2) Rule 4: WBC is Admission and PLT is Admission The Patient is Discharge 4-2) Rule 4: WBC is Admission and PLT is Admission The Patient is Discharge

상기한 임상진단 결정을 위한 규칙 패턴 각각을 도 3의 예에 적용하여 임상진단 발생빈도(freq.)를 검출하면, 다음과 같다. When each of the rule patterns for determining the clinical diagnosis is applied to the example of FIG. 3, the frequency of clinical diagnosis is detected, as follows.

1-1) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Admission with freq. is 201-1) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Admission with freq. is 20

1-2) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Discharge with freq. is 101-2) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Discharge with freq. is 10

2-1) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Admission with freq. is 302-1) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Admission with freq. is 30

2-2) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Discharge with freq. is 102-2) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Discharge with freq. is 10

3-1) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Admission with freq. is 153-1) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Admission with freq. is 15

3-2) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Discharge with freq. is 203-2) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Discharge with freq. is 20

4-1) Rule 4: WBC is Admission and PLT is Admission The Patient is Admission with freq. is 54-1) Rule 4: WBC is Admission and PLT is Admission The Patient is Admission with freq. is 5

4-2) Rule 4: WBC is Admission and PLT is Admission The Patient is Discharge with freq. is 54-2) Rule 4: WBC is Admission and PLT is Admission The Patient is Discharge with freq. is 5

상기 임상진단 발생빈도가 검출되면, 상기 임상진단 발생빈도에 따라 상기 규칙 패턴은 다음과 같이 최종 결정된다. When the frequency of clinical diagnosis is detected, the rule pattern is finally determined according to the frequency of clinical diagnosis.

1) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Admission 1) Rule 1: WBC is Discharge and PLT is Discharge Then Patient is Admission

2) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Admission 2) Rule 2: WBC is Admission and PLT is Discharge Then Patient is Admission

3) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Discharge 3) Rule 3: WBC is Discharge and PLT is Admission Then Patient is Discharge

<일반화><Generalization>

이제 상기한 본 발명을 수학식을 통해 일반화하는 과정을 설명한다.Now, the process of generalizing the present invention through the equation will be described.

입력 데이터 X={x_i|x_i=(x_i1,x_i2,...,x_in),i=1,2,...,s}는 s개의 인스턴스들을 포함하는 n차원 벡터로서, j번째 속성(즉 검사 항목)은 a_j={(x_1j,x_2j,...,x_sj},(j=1,...,n)은 s개의 데이터 포인터들로 이루어져 있고, 출력 C={c_k|k=1,2,...,m}은 m개의 클래스(임상진단)로 이루어진 집합이다. The input data X = {x _i | x _i = (x _i1 , x _i2 , ..., x _in ), i = 1,2, ..., s} is an n-dimensional vector containing s instances, The jth attribute (that is, the check item) is a _j = {(x _1j , x _2j , ..., x _sj }, (j = 1, ..., n) consists of s data pointers, and the output C = {c _k | k = 1,2, ..., m} is a set of m classes (clinical diagnostics).

<제1단계-속성값들의 내부구간 추출><Step 1-Extracting Internal Sections of Attribute Values>

j번째 속성 a_j의 도메인 d_j=[min(a_j),max(a_j)]과 임의의 클래스에 대응하는 j번째 속성의 속성값들의 내부구간 I_jk=[I^l _jk,I^u _jk],I_jk∈d_j들을 추출한다. 여기서, I^l _jk,I^u _jk는 j번째 속성에서 k번째 클래스에 속하는 속성값들의 최소값과 최대값을 나타낸다. j th attribute domain of _{_{a j d j = [min (}} a j), max (a j)] and the internal section of the property value of the j-th attribute corresponding to any Class I _jk = [I ^l _jk, I ^u _jk ], I _jk ∈d _j are extracted. Here, I ^l _jk and I ^u _jk represent the minimum and maximum values of the attribute values belonging to the k th class in the j th attribute.

<제2단계-추출된 내부구간들 사이의 중첩영역 검출><Step 2-Detecting overlapping regions between extracted internal sections>

상기 추출된 내부구간들 사이에서 중첩영역(overlapping region) O_j=[O^lj _j,O^uj _j]을 찾는다. 여기서 [O^lj _j,O^uj _j]는 모든 중첩된 영역들 사이에서 하한(lower)과 상한 경계(upper bound)를 나타낸다. The overlapping region O _j = [O ^lj _j , O ^uj _j ] is found between the extracted internal sections. Where [O ^lj _j , O ^uj _j ] represents the lower and upper bounds between all overlapping regions.

<제3단계-속성값들의 빈도수를 산출하여 적합도 평가><Step 3-Evaluate fitness by calculating the frequency of the attribute values>

중첩 영역 내에서 서로 다른 클래스들이 동일한 속성값들을 가지지 않을 때 속성값들의 빈도수를 수학식 1에 따라 계산하고, 해당 속성의 적합도(fitness degree)를 평가한다. When different classes do not have the same attribute values in the overlapped region, the frequency of the attribute values is calculated according to Equation 1, and the fitness degree of the attribute is evaluated.

상기 수학식 1에서 t^k _j는 j번째 속성에서 k번째 클래스에 포함된 유일한 속성값들의 빈도수를 의미하고, s는 인스턴스들의 수를 나타낸다. 이때 h^k _j는 중첩영역에서 k번째 클래스에 속한 속성값들의 상대적인 분리 가능성 정도를 나타내고, h^k _j∈[0,1]의 값을 가진다. In Equation 1, t ^k _j denotes the frequency of unique attribute values included in the k th class in the j th attribute, and s denotes the number of instances. In this case, h ^k _j represents a relative degree of likelihood of separation of attribute values belonging to the k th class in the overlap region, and has a value of h ^k _j ∈ [0,1].

예를 들어, 전체 인스턴스의 수가 20일 때 j번째 속성의 중첩영역에서 표 1과 같은 클래스별 속성값들의 분포가 존재한다고 가정한다. For example, suppose that the distribution of attribute values for each class as shown in Table 1 exists in the overlap region of the j-th attribute when the total number of instances is 20.

상기 표 1에서 j번째 속성의 적합도는 수학식 1에 의해 클래스 1과 클래스 2의 분리 가능성 정도의 합인 H_j=h¹ _j+h²j=0.2+0.2=0.4로 나타낼 수 있다. 따라서 j번째 속성의 최대 분리 가능성 정도는 0.4라는 것을 알 수 있다. In Table 1, the goodness of fit of the j-th attribute may be expressed as H _j = h ¹ _j + h ² j = 0.2 + 0.2 = 0.4, which is the sum of the degree of separation between class 1 and class 2 according to Equation 1. Therefore, it can be seen that the maximum degree of separation of the j-th attribute is 0.4.

<제4단계-각 속성에 대한 컷 오프(cut-off) 값 결정>Step 4-Determine the cut-off value for each property

각 속성의 컷 오프(cut-off) 값을 결정하기 위하여 중첩 영역내에서 중복된 속성값들의 가중치 평균값(weighted average value), 즉 무게 중심 값을 계산한다. In order to determine a cut-off value of each attribute, a weighted average value, i.e., a center of gravity value of overlapping attribute values in the overlapping region is calculated.

상기 수학식 2에서 x_ij는 j번째 속성에서 i번째 인스턴스의 속성값을 의미하고, n_ij는 중복된 속성값들의 빈도수를 나타낸다. 예를 들어 제3단계의 예제에서 서로 다른 2개의 클래스에 대한 중복된 속성값 1.1, 1.3, 1.4의 빈도수는 각각 3,3,3이므로 수학식 2에 의해 계산된 무게중심값은 11.4/9=1.2667이 된다. 게다가 이는 전체 중첩영역에 대한 무게중심 1.3756이 오분류(7개)에 비해 상대적으로 적은 오분류(6개)를 가진다. In Equation 2, x _ij represents an attribute value of the i th instance in the j th attribute, and n _ij represents a frequency of duplicate attribute values. For example, in the example of step 3, the duplicated property values 1.1, 1.3, and 1.4 for two different classes are 3, 3, and 3, respectively, so the center of gravity calculated by Equation 2 is 11.4 / 9 = 1.2667. In addition, it has a relatively small misclassification (6) of 1.3756 as the center of gravity for the entire overlapping area (7).

이후 컴퓨팅 시스템은 if-then 구조를 이용하여 주어진 데이터로부터 규칙패턴을 생성한다(106단계). The computing system then generates a rule pattern from the given data using the if-then structure (step 106).

여기서, a_j(j=1,2,...,n)는 j번째 속성을, A_ik(i=1,2,...,M;k=1,2,...,m)는 수학식 2로부터 계산된 컷 오프 값을 기준으로 분할된 i번째 규칙에서 k번째 클래스의 구간을 freq^k는 k번째 클래스에서 i번째 규칙 패턴의 발생 빈도수를 나타낸다. Where a _j (j = 1,2, ..., n) is the jth attribute, A _ik (i = 1,2, ..., M; k = 1,2, ..., m) Denotes the interval of the k th class in the i th rule divided based on the cutoff value calculated from Equation 2, and freq ^k denotes the frequency of occurrence of the i th rule pattern in the k th class.

이러한 클래스별 규칙의 발생 빈도 수를 근거로 클래스들 간의 규칙의 충돌문제(conflict problem)를 수학식 3에 따라 분해한다. Based on the frequency of occurrence of such a rule for each class, the conflict problem of rules between classes is decomposed according to Equation (3).

상기 수학식 3의 R^* _i는 i번째 규칙 패턴의 출력을 나타내고, NA는 어떠한 클래스의 규칙 패턴으로도 정의할 수 없는 규칙을 의미한다. R ^* _i in Equation 3 represents the output of the i-th rule pattern, and NA means a rule that cannot be defined by a rule pattern of any class.

도 1은 본 발명의 바람직한 실시예에 따른 임상진단 결정 규칙 생성방법의 흐름도. 1 is a flowchart of a method for generating a clinical diagnosis decision rule according to a preferred embodiment of the present invention.

도 2는 본 발명의 바람직한 실시예에 따른 데이터를 예시한 도면. 2 illustrates data in accordance with a preferred embodiment of the present invention.

도 3은 본 발명의 바람직한 실시예에 따른 임상진단 결정 규칙을 예시한 도면. Figure 3 illustrates a clinical diagnostic decision rule according to a preferred embodiment of the present invention.

Claims

A server system networked with a database storing test data of a plurality of test items for each of a plurality of user terminals and a plurality of patients,

If the server has a request from the user terminal, the server,

Receiving the test data from the database, classifying the test data into test items, sorting the test data according to the size of the test data, and marking a clinical diagnosis result on the test data;

Calculating cut-off values for determining fitness and clinical diagnosis for each test item;

Generating a clinical diagnosis rule based on a cut-off value for each of the plurality of test items;

Detecting the incidence of clinical diagnosis for the clinical diagnosis decision rule;

And determining a part of the clinical diagnosis decision rule as the final clinical diagnosis decision rule according to the frequency of clinical diagnosis and transmitting it to the user terminal.

The goodness of fit indicates how well the test item is suitable for determining the clinical diagnosis,

Detecting the overlapping area of clinical diagnosis results for each test item, calculating the separability indicating the degree of separation that can be separated for each clinical diagnosis result for each of the plurality of examination data included in the overlapping area, and the separation degree Obtained by calculating the sum of

And the cut off value is a weighted average value of values of test data in the overlapping area.

The method of claim 1,

The clinical diagnosis decision rule,

When test data of a plurality of test items is received for a patient, clinical diagnosis is performed according to whether the test data of each test item belongs to the divided clinical diagnosis section based on the cut-off value detected for each test item. Method for generating a clinical diagnostic rule, characterized in that the decision.

The method of claim 1,

And the cut-off value is calculated according to Equation 4.

In Equation 4, x _ij denotes the i-th data value in the j-th test item, and n _ij denotes the frequency of data in which clinical diagnosis results are duplicated.

The method of claim 1,

The goodness of fit,

And when different clinical diagnosis results do not have the same data values in the overlapping area, calculating the frequency of the data values according to Equation 5, and obtaining the clinical diagnosis decision rule.

In Equation 5, t ^k _j means the frequency of the unique attribute values included in the k th clinical diagnosis in the j th test item, s indicates the number of data, and h ^k _j is the k th clinical diagnosis in the overlapping area. It indicates the degree of relative separation of the test items belonging to and has the value of h ^k _j ∈ [0,1].

If the server requests a test item selection for clinical diagnosis determination from the user terminal, the server,

Receiving test data of a plurality of test items for each of a plurality of patients from the database, classifying the test items by test items, sorting the test items according to the size of the test items, and marking clinical diagnosis results on each test data;

And calculating and guiding the fitness for each inspection item.

Detect areas where clinical diagnosis results overlap for each test item,

The method for generating a clinical diagnosis rule, characterized in that the frequency obtained when different clinical diagnosis results do not have the same data value in the overlapping area is calculated and obtained according to Equation 6.

In Equation 6, t ^k _j denotes the frequency of unique attribute values included in the k th clinical diagnosis in the j th test item, s denotes the number of data, and h ^k _j denotes the k th clinical diagnosis in the overlapping region. It indicates the degree of relative separation of the test items belonging to and has the value of h ^k _j ∈ [0,1].