KR102467865B1

KR102467865B1 - System for Refining Hospital Laboratory Data through Ontology Database Driven Rule-based Algorithm

Info

Publication number: KR102467865B1
Application number: KR1020200081236A
Authority: KR
Inventors: 주형준; 김종호; 박수완; 장준호
Original assignee: 고려대학교 산학협력단
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2022-11-17
Also published as: KR20220003704A

Abstract

본 발명의 일 실시예에 따른 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템은, 병원진단검사결과의 정제를 위해 기 구축된 온톨로지 데이터베이스; 입력된 병원진단검사결과를 기 정해진 형태로 변환하여 데이터를 생성하는 데이터 도입부; 상기 데이터 도입부에 의해 생성된 데이터를 전달 받아서 룰 기반으로 데이터 형식의 변환 및 검증, 데이터 삭제 및 데이터 분리 중 적어도 하나 이상을 수행하는 데이터 전처리부; 상기 데이터 전처리부에 의해 전처리된 데이터를 전달 받고 상기 온톨로지 데이터베이스를 기반으로 기 정의된 기준에 따라 룰 기반으로 데이터를 추출하는 데이터 추출부; 및 상기 데이터 추출부에 의해 추출된 데이터를 원본 데이터를 기준으로 분석하여 데이터 전환의 오류 가능성을 평가하는 데이터 후처리부를 포함할 수 있다.According to an embodiment of the present invention, a system for refining hospital diagnostic test results using an ontology database-based rule-based algorithm includes a pre-established ontology database for refining hospital diagnostic test results; a data introduction unit for generating data by converting input hospital diagnostic test results into a predetermined form; a data pre-processing unit receiving the data generated by the data introduction unit and performing at least one of conversion and verification of a data format, data deletion, and data separation based on rules; a data extraction unit receiving the data preprocessed by the data preprocessing unit and extracting data based on a rule according to a predefined criterion based on the ontology database; and a data post-processing unit that analyzes the data extracted by the data extraction unit based on the original data and evaluates a possibility of an error in data conversion.

Description

System for Refining Hospital Laboratory Data through Ontology Database Driven Rule-based Algorithm}

본 출원은 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템에 관한 것이다.This application relates to a system for refining hospital diagnostic test results using an ontology database-based rule-based algorithm.

의료 분야의 빅데이터를 구축하기 위해서는 데이터의 정제가 필수적이다. 그러나, 의료기관 내의 대다수의 데이터가 코드화되지 않은 비정형 텍스트 혹은 이미지로 구성되어 있으며, 특히 병원검사결과들은 전문적인 용어 및 도메인 별로 특이한 구성으로 이루어져 있어서 일반적인 데이터 정제 과정을 통해 가치 있는 데이터로 정제하기 어렵다.In order to build big data in the medical field, data purification is essential. However, the majority of data in medical institutions is composed of uncoded unstructured text or images, and in particular, hospital examination results are composed of specialized terms and domains, so it is difficult to refine them into valuable data through a general data purification process.

실제로 의료기관의 모든 임상데이터를 의학 분야별로 맞추어 데이터베이스로 구축하기 위한 기술은 전무한 실정이며, 모든 의학분야를 아우르는 정제 시스템을 개발하는 것은 현실적으로 불가능하다.In fact, there is no technology to build a database by matching all clinical data of medical institutions to each medical field, and it is practically impossible to develop a purification system that covers all medical fields.

따라서, 고품질의 빅데이터를 구축하기 위해서는 각 검사 종류별로 적합한 자동화 알고리즘을 개발하는 것이 필요하다.Therefore, in order to build high-quality big data, it is necessary to develop an automated algorithm suitable for each inspection type.

한편, 의료기관에서 시행되는 진단검사결과는 환자의 치료법을 결정하는데 중요할 뿐만 데이터 양적으로도 가장 많은 분량을 차지하고 있다. 이러한 진단검사결과는 숫자 형태의 데이터로 이루어진 경우도 있으나 문자 또는 기호가 혼재되어 있는 경우가 많아서 해당 결과를 문자, 즉 텍스트 형태의 데이터로 저장하고 있다. 이러한 데이터를 적절하게 사용 및/또는 분석하기 위해서는 텍스트 형태의 데이터에서 다른 형태에 해당하는 부분을 추출하여 변환할 필요가 있다.On the other hand, diagnostic test results performed in medical institutions are not only important in determining treatment for patients, but also account for the largest amount of data. In some cases, these diagnostic test results consist of data in the form of numbers, but in many cases, letters or symbols are mixed, so the results are stored as text, that is, data in the form of text. In order to properly use and/or analyze such data, it is necessary to extract and convert a part corresponding to a different format from text format data.

특히, 병원진단검사결과의 경우, 최근 자동화된 시스템을 통해 병원정보 시스템 등의 서버에 자동으로 입력되는 경우도 있으나, 시스템 오류로 인해 수기로 보정하는 경우가 있으며 검사결과에 따라서 관련 담당자가 추가적인 코멘트나 주석을 달기도 한다. 이러한 과정에서 데이터 형식의 일관성이 손상되고, 이러한 이유로 데이터를 비정형 텍스트 형태로 저장하고 있는 실정이다.In particular, in the case of hospital diagnostic test results, there are cases where they are automatically entered into servers such as hospital information systems through recently automated systems, but there are cases where they are manually corrected due to system errors. I also annotate. In this process, the consistency of the data format is damaged, and for this reason, data is stored in an unstructured text format.

따라서, 당해 기술분야에서는 보다 정확하고 효율적으로 병원진단검사결과 데이터를 정제하기 위한 방안이 요구되고 있다.Therefore, in the art, there is a need for a method for more accurately and efficiently refining hospital diagnostic test result data.

특히, 병원진단검사결과를 자동으로 단시간 내에 타입별 데이터베이스를 구축하고, 필요한 경우 사용자가 원하는 용어 및 코드로 전환하기 위한 방안이 요구되고 있다.In particular, there is a need for a method for automatically constructing a database for each type of hospital diagnostic test results within a short time and, if necessary, converting them into terms and codes desired by the user.

상기 과제를 해결하기 위해서, 본 발명의 일 실시예는 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템을 제공한다.In order to solve the above problems, an embodiment of the present invention provides a system for refining hospital diagnostic test results using a rule-based algorithm based on an ontology database.

상기 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템은, 병원진단검사결과의 정제를 위해 기 구축된 온톨로지 데이터베이스; 입력된 병원진단검사결과를 기 정해진 형태로 변환하여 데이터를 생성하는 데이터 도입부; 상기 데이터 도입부에 의해 생성된 데이터를 전달 받아서 룰 기반으로 데이터 형식의 변환 및 검증, 데이터 삭제 및 데이터 분리 중 적어도 하나 이상을 수행하는 데이터 전처리부; 상기 데이터 전처리부에 의해 전처리된 데이터를 전달 받고 상기 온톨로지 데이터베이스를 기반으로 기 정의된 기준에 따라 룰 기반으로 데이터를 추출하는 데이터 추출부; 및 상기 데이터 추출부에 의해 추출된 데이터를 원본 데이터를 기준으로 분석하여 데이터 전환의 오류 가능성을 평가하는 데이터 후처리부를 포함할 수 있다.The system for refining hospital diagnostic test results using the rule-based algorithm based on the ontology database includes a pre-established ontology database for refining hospital diagnostic test results; a data introduction unit for generating data by converting input hospital diagnostic test results into a predetermined form; a data pre-processing unit receiving the data generated by the data introduction unit and performing at least one of conversion and verification of a data format, data deletion and data separation based on rules; a data extraction unit receiving the data preprocessed by the data preprocessing unit and extracting data based on a rule according to a predefined criterion based on the ontology database; and a data post-processing unit that analyzes the data extracted by the data extraction unit based on the original data and evaluates a possibility of an error in data conversion.

덧붙여 상기한 과제의 해결수단은, 본 발명의 특징을 모두 열거한 것이 아니다. 본 발명의 다양한 특징과 그에 따른 장점과 효과는 아래의 구체적인 실시형태를 참조하여 보다 상세하게 이해될 수 있을 것이다.In addition, the solution to the above problem does not enumerate all the features of the present invention. Various features of the present invention and the advantages and effects thereof will be understood in more detail with reference to specific embodiments below.

본 발명의 일 실시예에 따르면, 의료기관에서 활용도가 높은 병원진단검사결과를 대상으로 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용하여 정제함으로써 고품질의 데이터베이스를 구축할 수 있다.According to an embodiment of the present invention, it is possible to build a high-quality database by refining hospital diagnostic test results, which are highly utilized in medical institutions, using a rule-based algorithm based on an ontology database.

이를 통해, 데이터베이스의 다기관 상호운용성을 높이고 데이터 분석을 용이하게 할 수 있다.Through this, multi-agency interoperability of the database can be increased and data analysis can be facilitated.

도 1은 본 발명의 일 실시예에 따른 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템의 구성도이다.
도 2는 도 1에 도시된 데이터 도입부의 일 구현예를 도시하는 도면이다.
도 3은 도 1에 도시된 데이터 전처리부에 의한 처리 흐름을 도시하는 도면이다.
도 4는 도 1에 도시된 데이터 전처리부의 일 구현예를 도시하는 도면이다.
도 5는 도 1에 도시된 데이터 추출부에 의한 처리 흐름을 도시하는 도면이다.
도 6은 도 1에 도시된 데이터 추출부의 일 구현예를 도시하는 도면이다.
도 7은 도 1에 도시된 데이터 후처리부에 의한 처리 흐름을 도시하는 도면이다.
도 8은 도 1에 도시된 데이터 후처리부의 일 구현예를 도시하는 도면이다.
도 9는 본 발명의 일 실시예에 따른 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템에 의한 병원진단검사결과 처리 결과의 일 예를 도시하는 도면이다.1 is a block diagram of a system for refining hospital diagnostic test results using a rule-based algorithm based on an ontology database according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an implementation example of the data introduction unit shown in FIG. 1 .
FIG. 3 is a diagram illustrating a processing flow by the data pre-processing unit shown in FIG. 1 .
FIG. 4 is a diagram illustrating an implementation example of the data pre-processing unit shown in FIG. 1 .
FIG. 5 is a diagram illustrating a processing flow by the data extraction unit shown in FIG. 1 .
FIG. 6 is a diagram illustrating an implementation example of the data extraction unit shown in FIG. 1 .
FIG. 7 is a diagram illustrating a processing flow by the data post-processing unit shown in FIG. 1 .
FIG. 8 is a diagram illustrating an implementation example of the data post-processing unit shown in FIG. 1 .
9 is a diagram showing an example of a hospital diagnostic test result processing result by a hospital diagnostic test result purification system using an ontology database-based rule-based algorithm according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, preferred embodiments will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, in describing a preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the same reference numerals are used throughout the drawings for parts having similar functions and actions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 '연결'되어 있다고 할 때, 이는 '직접적으로 연결'되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 '간접적으로 연결'되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 '포함'한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In addition, throughout the specification, when a part is said to be 'connected' to another part, this is not only the case where it is 'directly connected', but also the case where it is 'indirectly connected' with another element in between. include In addition, 'including' a certain component means that other components may be further included, rather than excluding other components unless otherwise stated.

도 1은 본 발명의 일 실시예에 따른 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템의 구성도이다.1 is a block diagram of a system for refining hospital diagnostic test results using a rule-based algorithm based on an ontology database according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템(100)은 데이터 도입부(110), 데이터 전처리부(120), 데이터 추출부(130), 데이터 후처리부(140), 데이터베이스 서버(150) 및 온톨로지 데이터베이스(160)를 포함하여 구성될 수 있다.Referring to FIG. 1, a hospital diagnostic test result refinement system 100 using an ontology database-based rule-based algorithm according to an embodiment of the present invention includes a data introduction unit 110, a data pre-processing unit 120, and a data extraction unit ( 130), a data post-processing unit 140, a database server 150, and an ontology database 160.

데이터 도입부(110)는 병원정보시스템 등과 같은 의료기관의 서버에 저장된 데이터, 즉, 병원진단검사결과를 입력 받고 이를 후술하는 구성에 의해 처리할 수 있는 기 정해진 형태로 변환하여 데이터를 생성할 수 있다. 여기서, 병원진단검사결과는 의료기관에서 수집 및 저장된 것으로, 인위적으로 작성되거나 의료 장비에서 추출된 것일 수 있으며, 다양한 형태로 저장된 것일 수 있다. 또한, 병원진단검사결과는 단일 결과뿐만 아니라 복수의 결과를 포함할 수도 있다. 또한, 병원진단검사결과는 문자, 숫자 및 기호 중 적어도 하나의 조합으로 이루어질 수 있다.The data introduction unit 110 may generate data by receiving data stored in a server of a medical institution, such as a hospital information system, that is, hospital diagnostic test results, and converting them into a predetermined form that can be processed by a configuration described below. Here, the hospital diagnostic test results are collected and stored in a medical institution, may be artificially written or extracted from medical equipment, and may be stored in various forms. In addition, the hospital diagnostic test result may include not only a single result but also a plurality of results. In addition, the hospital diagnostic test result may consist of a combination of at least one of letters, numbers, and symbols.

일 실시예에 따르면, 데이터 도입부(110)은 사용자에 의해 직접 입력된 텍스트를 입력 받을 수도 있고, 예를 들어 CSV(comma separated value) 파일, 혹은 EXCEL 파일, 텍스트 파일 등과 같은 임의의 파일 형식으로 데이터를 입력 받을 수도 있다. 그러나, 상술한 파일 형식은 예시에 불과한 것으로, 특정한 파일로 한정되지 않고 어떠한 파일 형식으로도 데이터를 입력 받을 수 있다.According to one embodiment, the data introduction unit 110 may receive text directly input by a user, and data in an arbitrary file format, such as a CSV (comma separated value) file, an EXCEL file, or a text file, for example. can also be input. However, the above-described file format is only an example, and data can be input in any file format without being limited to a specific file.

다른 실시예에 따르면, 데이터 도입부(110)는 예를 들어 MS-SQL 등과 같은 DBMS(database management system)를 통해 병원진단검사결과가 저장된 데이터베이스에 접속하여 데이터를 입력 받을 수도 있다.According to another embodiment, the data introduction unit 110 may receive data by accessing a database in which hospital diagnostic test results are stored through a database management system (DBMS) such as MS-SQL, for example.

도 2는 도 1에 도시된 데이터 도입부의 일 구현예를 도시하는 도면이다.FIG. 2 is a diagram illustrating an implementation example of the data introduction unit shown in FIG. 1 .

일 실시예에 따르면, 데이터 도입부(110)는 도 2의 (a)에 도시된 알고리즘에 따라 병원진단검사결과가 저장된 데이터베이스에 연결하여 데이터를 불러와서 기 정해진 형태로 변환된 데이터를 생성할 수 있다.According to one embodiment, the data introduction unit 110 may connect to a database in which hospital diagnostic test results are stored according to the algorithm shown in (a) of FIG. 2 to load data and generate data converted into a predetermined form. .

도 2의 (b)는 데이터베이스의 연결 정보 및 상태값을 도시하는 것이고, 도 2의 (c)는 데이터 도입부(110)에 의해 데이터베이스로부터 데이터를 불러오는데 소요된 시간을 나타내는 것으로, 일 실시예에 따라 780 만개의 데이터를 도입하는데 약 9.56분이 소요됨을 확인하였다.Figure 2 (b) shows connection information and status values of the database, Figure 2 (c) shows the time required to load data from the database by the data introduction unit 110, in one embodiment Accordingly, it was confirmed that it took about 9.56 minutes to introduce 7.8 million data.

데이터 전처리부(120)는 데이터 도입부(110)에 의해 생성된 데이터를 전달 받아서 룰 기반으로 데이터 형식의 변환 및 검증, 데이터 삭제 및 데이터 분리 중 적어도 하나 이상을 수행할 수 있다. The data pre-processing unit 120 may receive data generated by the data introduction unit 110 and perform at least one of conversion and verification of a data format, data deletion, and data separation based on rules.

도 3은 도 1에 도시된 데이터 전처리부에 의한 처리 흐름을 도시하는 도면이다.FIG. 3 is a diagram illustrating a processing flow by the data pre-processing unit shown in FIG. 1 .

도 3을 참조하면, 데이터 전처리부(120)는 입력된 원본 데이터를 분석하여 문자 형식으로 변환하거나, 데이터 형식을 검증하여 불필요한 데이터를 삭제하고, 복수 개의 정보를 포함하는 데이터를 분리할 수 있다.Referring to FIG. 3 , the data pre-processing unit 120 analyzes input original data and converts it into a text format, verifies the data format, deletes unnecessary data, and separates data including a plurality of pieces of information.

구체적으로, 데이터 전처리부(120)는 원본 데이터를 입력 받으면(S310), 해당 데이터의 데이터 타입이 문자 형식인지 확인하여(S320), 문자 형식인 경우 문자 형식으로 변환하여(S330) 임시 데이터(temp_df)를 생성할 수 있다(S340).Specifically, when the data pre-processing unit 120 receives original data (S310), it checks whether the data type of the corresponding data is in a character format (S320), and if it is in a character format, converts it into a character format (S330), and converts the data into temporary data (temp_df). ) can be generated (S340).

이후, 데이터 전처리부(120)는 기 정의된 데이터 형식 검증 룰 기반으로 검증을 수행하여(S350), 진단 검사 결과 형식에 합당한지 여부를 판단하고(S360), 합당하지 않으면 일부 혹은 전체 데이터를 삭제하여(S370), 전처리가 완료된 데이터(전처리_df)을 생성할 수 있다(S380). Thereafter, the data pre-processing unit 120 performs verification based on predefined data format verification rules (S350), determines whether or not the format of the diagnostic test result is appropriate (S360), and if not appropriate, deletes some or all data (S370), data (preprocessing_df) for which preprocessing has been completed may be generated (S380).

도 4는 도 1에 도시된 데이터 전처리부의 일 구현예를 도시하는 도면으로, 도 4에 도시된 바와 같이, 데이터 형식 검증 및 분리/삭제의 경우 코드, 날짜 및 검사결과 종류에 대한 검증 및 분리를 수행하고, 대문자를 소문자로 변환하는 단계 등을 수행할 수 있다.4 is a diagram showing an implementation example of the data pre-processing unit shown in FIG. 1. As shown in FIG. 4, in the case of data format verification and separation/deletion, verification and separation of codes, dates, and inspection result types are performed. and converting uppercase letters to lowercase letters, etc.

데이터 추출부(130)는 전처리된 데이터를 전달 받고 온톨로지 데이터베이스(160)를 기반으로 기 정의된 기준에 따라 룰 기반으로 데이터를 추출할 수 있다.The data extraction unit 130 may receive the preprocessed data and extract the data based on rules according to predefined criteria based on the ontology database 160 .

일 실시예에 따르면, 데이터 추출부(130)는 기 정의된 종류(예를 들어, N개 종류의 데이터)로 데이터를 추출할 수 있다.According to an embodiment, the data extractor 130 may extract data in predefined types (eg, N types of data).

도 5는 도 1에 도시된 데이터 추출부에 의한 처리 흐름을 도시하는 도면이다. FIG. 5 is a diagram illustrating a processing flow by the data extraction unit shown in FIG. 1 .

도 5를 참조하면, 데이터 추출부(130)는 전처리된 데이터(전처리_df)에서 오퍼레이터(Operator) (또는 기호(Sign)) 데이터(예를 들어, 이상, 이하, 초과 및 미만을 표현하는 데이터), 단위(Unit) 데이터, 카테고리(Category) 데이터, 비율(Ratio) 데이터, 강도(Grade) 데이터, 타입(Type) 데이터 및 수치(Number) 데이터 등을 포함하는 기 정의된 종류의 데이터에 해당하는지 여부를 순차로 확인하고(S510, S520, S530, S540, S550, S560, S570), 확인된 값을 해당 종류의 데이터로 추출함으로써(S515, S525, S535, S545, S555, S565, S575), 추출 데이터(Extract_df)를 생성할 수 있다. Referring to FIG. 5 , the data extraction unit 130 performs operator (or sign) data (eg, data expressing more than, less than, greater than, and less than) in preprocessed data (preprocessing_df). ), unit data, category data, ratio data, grade data, type data and number data, etc. by sequentially checking (S510, S520, S530, S540, S550, S560, S570), and extracting the checked value as the corresponding type of data (S515, S525, S535, S545, S555, S565, S575), extraction Data (Extract_df) can be created.

한편, 상술한 과정을 거쳐서 추출되지 않은 데이터는 후술하는 데이터 후처리부(140)에 의한 후처리를 위해 후처리 데이터(후처리_df)를 생성할 수 있다.Meanwhile, data not extracted through the above process may generate post-processing data (post-processing_df) for post-processing by the data post-processing unit 140 to be described later.

도 6은 도 1에 도시된 데이터 추출부의 일 구현예를 도시하는 도면으로, 도 6을 참조하면, 데이터 추출부(130)는 전달된 데이터에서 필요로 하는 의료 정보를 추출하기 위해 기호, 단위, 1차 텍스트, 1차 카테고리, 2차 텍스트, 숫자 및 2차 카테고리를 추출함으로써 순도 높은 데이터를 추출할 수 있다.FIG. 6 is a diagram showing an implementation example of the data extraction unit shown in FIG. 1. Referring to FIG. 6, the data extraction unit 130 extracts required medical information from transmitted data, such as symbols, units, High-purity data can be extracted by extracting primary text, primary category, secondary text, number, and secondary category.

데이터 추출부(130)에 의한 처리 흐름 및 추출 데이터 종류가 반드시 이로 제한되는 것은 아니며, 입력되는 데이터의 구조에 따라 변경 가능하다. 즉, 의료기관별로 병원진단검사결과 데이터의 구조가 상이할 수 있으므로, 필요에 따라 데이터의 추출 순서 및 추출 데이터의 종류를 변경함으로써 보다 순도 높은 데이터를 추출하도록 할 수 있다.The processing flow and extracted data types by the data extraction unit 130 are not necessarily limited thereto and may be changed according to the structure of input data. That is, since the structure of hospital diagnostic test result data may be different for each medical institution, higher purity data can be extracted by changing the extraction order of data and the type of extracted data as needed.

데이터 후처리부(140)는 데이터 추출부(130)에 의해 추출된 데이터(Extract_df)를 원본 데이터를 기준으로 분석하여 데이터 전환의 오류 가능성을 평가하여 최종 결과를 출력함과 더불어, 데이터 추출부(130)에 의해 추출되지 않은 후처리 데이터(후처리_df)에 대해 기 정의된 추가 추출 룰을 기반으로 데이터 추출을 진행하여 온톨로지 데이터베이스(160)에 대한 업데이트를 진행할 수 있다.The data post-processing unit 140 analyzes the data (Extract_df) extracted by the data extraction unit 130 based on the original data, evaluates the possibility of errors in data conversion, and outputs a final result, and the data extraction unit 130 For the post-processing data (post-processing_df) not extracted by ), the ontology database 160 may be updated by performing data extraction based on a predefined additional extraction rule.

또한, 데이터 후처리부(140)는 추출된 데이터에 대해 온톨로지 데이터베이스(160)에 저장된 사용자가 원하는 표준 용어 및 표준 코드로 전환을 수행할 수도 있다. In addition, the data post-processing unit 140 may convert the extracted data into standard terms and standard codes stored in the ontology database 160 desired by the user.

도 7은 도 1에 도시된 데이터 후처리부에 의한 처리 흐름을 도시하는 도면이다.FIG. 7 is a diagram illustrating a processing flow by the data post-processing unit shown in FIG. 1 .

도 7을 참조하면, 데이터 후처리부(140)는 후처리 데이터(후처리_df)에 남은 데이터가 존재하는 경우(S710, S715), 기 정의된 추가 추출 룰(추가 추출 Rule)을 기반으로 추출을 수행하여 추출 데이터(Extract_df)에 추가할 수 있다(S720).Referring to FIG. 7, the data post-processing unit 140 extracts data based on a pre-defined additional extraction rule (additional extraction rule) when there is remaining data in the post-processing data (post-processing_df) (S710, S715). It can be added to the extraction data (Extract_df) by performing (S720).

또한, 추가 추출을 진행한 후에도 남은 데이터가 존재하는 경우에는(S725), 온톨로지 데이터베이스(160)에 저장된 온톨로지를 분석하여 온톨로지 데이터베이스(160)에 대한 업데이트를 진행할 수 있다(S730, S735).In addition, if there is remaining data even after performing additional extraction (S725), the ontology database 160 may be updated by analyzing the ontology stored in the ontology database 160 (S730 and S735).

한편, 추출 데이터(Extract_df)는 추가 추출 Rule에 의해 추가적으로 추출된 데이터와 합쳐져서 확장된 추출 데이터(Extract_Ext)가 생성될 수 있다(S720, S740, S745). 확장된 추출 데이터(Extract_Ext)는 사용자의 필요에 따라 용어매핑 여부를 결정하여(S750), 용어매핑이 필요한 경우에는 사전에 정의된 메타데이터 등을 활용한 용어매핑 룰을 기반으로 표준용어 및 표준코드로 전환을 수행할 수 있으며(S755), 용어매핑이 필요 없는 경우에는 표준용어 및 표준코드로 전환하는 단계를 생략할 수 있다. 이후, 기 정의된 데이터 정리 Rule에 의해 데이터 형식을 점검 및 수정하여 최종 결과(Final_df)을 출력할 수 있다(S765).Meanwhile, the extracted data (Extract_df) may be combined with data additionally extracted by an additional extraction rule to generate extended extraction data (Extract_Ext) (S720, S740, S745). The extended extraction data (Extract_Ext) determines whether term mapping is required according to the user's needs (S750). Conversion can be performed (S755), and if term mapping is not required, the step of converting to standard terms and standard codes can be omitted. Thereafter, the data format can be checked and modified according to a predefined data cleaning rule, and the final result (Final_df) can be output (S765).

도 8은 도 1에 도시된 데이터 후처리부의 일 구현예를 도시하는 도면으로, 도 8의 (a)에 도시된 바와 같이 원본 데이터를 기준으로 추추된 데이터를 분석하여 데이터 전환의 오류를 검증할 수 있으며, 온톨로지 데이터베이스(160)를 기반으로 컬럼을 정리할 수 있다.FIG. 8 is a diagram showing an implementation example of the data post-processing unit shown in FIG. 1. As shown in FIG. Columns can be organized based on the ontology database 160.

도 8의 (b)는 도 8의 (a)의 알고리즘을 수행하는데 소요된 시간을 나타내는 것으로, 일 실시예에 따라 2억여건을 처리 및 검증하는데 약 35분 정도 소요됨을 확인하였다.Figure 8(b) shows the time required to perform the algorithm of Figure 8(a), and it was confirmed that it takes about 35 minutes to process and verify 200 million cases according to one embodiment.

데이터베이스 서버(150)는 데이터 후처리부(140)에 의해 출력된 데이터를 저장 및 제공할 수 있다.The database server 150 may store and provide data output by the data post-processing unit 140 .

온톨로지 데이터베이스(160)는 병원진단검사에서 사용되는 기호 및 용어 (예를 들어, 단위, 반복적으로 사용하는 구문, 약자, 표준용어, 표준코드 등)를 정리하여 구축한 데이터베이스로서, 국제 표준인 LOINC(Logical Observation Identifiers Names and Codes) 및 의료기관에서 수집한 데이터를 기반으로 구축된 것일 수 있다.The ontology database 160 is a database constructed by arranging symbols and terms (eg, units, repeatedly used phrases, abbreviations, standard terms, standard codes, etc.) used in hospital diagnostic tests. Logical Observation Identifiers Names and Codes) and data collected from medical institutions.

도 9는 본 발명의 일 실시예에 따른 온톨로지 데이터베이스 기반의 룰 기반 알고리즘을 이용한 병원진단검사결과 정제 시스템에 의한 병원진단검사결과 처리 결과의 일 예를 도시하는 도면이다.9 is a diagram showing an example of a hospital diagnostic test result processing result by a hospital diagnostic test result purification system using an ontology database-based rule-based algorithm according to an embodiment of the present invention.

도 9의 (a)에 도시된 바와 같이 입력된 병원진단검사결과는 도 1 내지 도 8을 참조하여 상술한 바와 같은 과정에 따라 처리되어 도 9의 (b)에 도시된 바와 같이 데이터 타입별로 추출되어 정리된 결과 데이터를 획득할 수 있다.The input hospital diagnostic test results as shown in (a) of FIG. 9 are processed according to the process described above with reference to FIGS. 1 to 8 and extracted by data type as shown in (b) of FIG. 9 It is possible to obtain organized result data.

상술한 본 발명의 실시예를 종합병원의 진단검사결과 데이터베이스에 저장된 약 2억 건의 샘플데이터를 적용한 결과, 데이터 타입별로 데이터를 추출하여 보다 정교한 데이터베이스를 구축하였으며, 그 중 일부 데이터는 의료 표준용어 및 코드로 전환하여 구축할 수 있었다. As a result of applying the above-described embodiment of the present invention to about 200 million sample data stored in the diagnostic test result database of a general hospital, a more sophisticated database was built by extracting data by data type, some of which are medical standard terms and I was able to build it by converting it to code.

또한, 7,840,263건의 데이터의 질검정 수행결과, 미처리 데이터 종류는 43건, 미처리 데이터 건수는 143건(0.0018%)에 불과하여, 고품질의 데이터베이스를 구축 가능함을 확인하였다.In addition, as a result of the quality test of 7,840,263 data, there were only 43 types of unprocessed data and only 143 cases (0.0018%) of unprocessed data, confirming that it is possible to build a high-quality database.

본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 본 발명에 따른 구성요소를 치환, 변형 및 변경할 수 있다는 것이 명백할 것이다.The present invention is not limited by the foregoing embodiments and accompanying drawings. It will be clear to those skilled in the art that the components according to the present invention can be substituted, modified, and changed without departing from the technical spirit of the present invention.

100: 병원진단검사결과 정제 시스템
110: 데이터 도입부
120: 데이터 전처리부
130: 데이터 추출부
140: 데이터 후처리부
150: 데이터베이스 서버
160: 온톨로지 데이터베이스100: Hospital diagnostic test result purification system
110: data introduction
120: data pre-processing unit
130: data extraction unit
140: data post-processing unit
150: database server
160: ontology database

Claims

Pre-constructed ontology database for purification of hospital diagnostic test results;
a data introduction unit for generating data by converting input hospital diagnostic test results into a predetermined form;
a data pre-processing unit receiving the data generated by the data introduction unit and performing at least one of conversion and verification of a data format, data deletion and data separation based on rules;
a data extraction unit receiving the data preprocessed by the data preprocessing unit and extracting data based on a rule according to a predefined criterion based on the ontology database; and
A data post-processing unit that analyzes the data extracted by the data extraction unit based on original data and evaluates the possibility of errors in data conversion;
The data pre-processing unit analyzes the input data and converts it into a character format, or verifies the data format based on a predefined data format verification rule to delete unnecessary data, separates data including a plurality of pieces of information,
The data post-processing unit proceeds with data extraction based on a predefined additional extraction rule for the post-processed data not extracted by the data extraction unit to update the ontology database Based on the ontology database, characterized in that Hospital diagnostic test result purification system using rule-based algorithm.

According to claim 1,
The hospital diagnostic test result purification system using a rule-based algorithm based on an ontology database, characterized in that the hospital diagnostic test results are collected and stored in a medical institution, artificially created or extracted from medical equipment.

According to claim 1,
The hospital diagnostic test result purification system using an ontology database-based rule-based algorithm, characterized in that the hospital diagnostic test result includes a single or a plurality of results, and consists of a combination of at least one of letters, numbers, and symbols.

According to claim 1,
The hospital diagnostic test result purification system using an ontology database-based rule-based algorithm, characterized in that the data introduction unit loads data by connecting to a database in which the hospital diagnostic test results are stored.

delete

According to claim 1,
The data extraction unit extracts data for each predefined type, hospital diagnostic test result purification system using an ontology database-based rule-based algorithm.

According to claim 6,
The predefined types include operator data, unit data, category data, ratio data, grade data, type data, text data, and numerical values ( A hospital diagnostic test result purification system using a rule-based algorithm based on an ontology database, characterized in that it includes at least one of Number) data.

According to claim 6,
The data extraction unit sequentially checks whether or not it corresponds to the predefined type of data in a predefined order, and extracts the checked value as the corresponding type of data. An ontology database-based rule-based algorithm Hospital diagnosis test result purification system using.

delete

According to claim 1,
The data post-processing unit converts the extracted data into standard terms and standard codes based on predefined term mapping rules and data organization rules. Hospital diagnostic test results using an ontology database-based rule-based algorithm purification system.

According to claim 1,
A hospital diagnostic test result purification system using a rule-based algorithm based on an ontology database, characterized in that it further comprises a database server for storing and providing the data output by the data post-processing unit.

According to claim 1,
The ontology database is a database constructed by organizing symbols and terms used in hospital diagnostic tests based on LOINC (Logical Observation Identifiers Names and Codes), an international standard, and data collected from medical institutions. Rules based on the ontology database, characterized in that Hospital diagnosis test results purification system using the based algorithm.