KR102521961B1

KR102521961B1 - Method and device for matching clinical trials data

Info

Publication number: KR102521961B1
Application number: KR1020220060807A
Authority: KR
Inventors: 정지희
Original assignee: (주)메디아이플러스
Priority date: 2019-12-30
Filing date: 2022-05-18
Publication date: 2023-04-14
Also published as: KR20210084909A; WO2021137359A1; KR20220070398A

Abstract

본 발명은 임상시험 데이터 매칭 방법 및 장치에 관한 것으로, 임상시험 데이터 매칭 장치는 임상시험 데이터가 포함된 적어도 하나의 웹 사이트에서 제1 임상시험 데이터를 수신하는 데이터 수신부, 상기 제1 임상시험 데이터를 분석하여 제1 유효 데이터를 추출한 후 표준화하여 데이터베이스에 저장하는 데이터 추출부, 상기 제1 유효 데이터 및 상기 데이터베이스에 저장된 제 2 유효 데이터 사이의 제1 유사도를 연산하고, 상기 제1 유사도에 서로 다른 가중치를 부여하여 제2 유사도를 산출하는 연산부, 상기 제1 임상시험 데이터 및 상기 제2 유사도를 통해 추출된 제2 임상시험 데이터를 매칭하는 데이터 매칭부 및 상기 제1 임상시험 데이터 및 상기 제2 임상시험 데이터를 표시하는 디스플레이부를 포함한다.The present invention relates to a clinical trial data matching method and apparatus, comprising: a data receiving unit for receiving first clinical trial data from at least one website containing clinical trial data; and the first clinical trial data. A data extraction unit that analyzes, extracts, standardizes, and stores first valid data in a database, calculates a first similarity between the first valid data and second valid data stored in the database, and weights different weights on the first similarity , a data matching unit that matches the first clinical trial data and the second clinical trial data extracted through the second similarity, and the first clinical trial data and the second clinical trial data. A display unit for displaying data is included.

Description

Clinical trial data matching method and device {METHOD AND DEVICE FOR MATCHING CLINICAL TRIALS DATA}

본 발명은 임상시험 데이터 매칭 방법 및 장치에 관한 것으로, 보다 자세하게는 데이터베이스에서 임상시험 데이터와 유사한 데이터를 매칭하고, 사용자에게 제공하는 임상시험 데이터 매칭 방법 및 장치에 관한 것이다.The present invention relates to a clinical trial data matching method and apparatus, and more particularly, to a clinical trial data matching method and apparatus for matching data similar to clinical trial data in a database and providing the data to a user.

임상시험은 의약품을 개발, 시판하기에 앞서 그 의약품의 안전성, 약동학 및 약리효과, 임상적 효과를 확인하기 위하여 사람을 대상으로 실시하는 시험이다. 임상시험을 통해 얻어진 자료를 관리하는 일은 임상시험 데이터의 신뢰도와 정도를 높이는 데 있어 중요한 요소이다.A clinical trial is a test conducted on human subjects to confirm the safety, pharmacokinetics, pharmacological effect, and clinical effect of a medicine prior to development and marketing of the medicine. Managing data obtained through clinical trials is an important factor in increasing the reliability and accuracy of clinical trial data.

일반적으로 임상시험 데이터는 업로드 되는 플랫폼에 따라 그 양식을 달리 하기 때문에, 여러 플랫폼에 포진되어 있는 임상시험 결과를 한눈에 파악하기가 어려워 데이터의 공유가 용이하지 않다. 따라서, 복수 개의 플랫폼에 게재된, 선행된 임상시험의 데이터를 기반으로 유사한 결과를 매칭하여 특정 의약품에 대한 다양한 임상시험 데이터를 제공하는 기술의 개발이 필요한 실정이다.In general, since clinical trial data has a different format depending on the platform on which it is uploaded, it is not easy to share data because it is difficult to grasp at a glance the results of clinical trials scattered across multiple platforms. Therefore, it is necessary to develop a technology that provides various clinical trial data for a specific drug by matching similar results based on data of preceding clinical trials published on a plurality of platforms.

본 발명은 전술한 문제점을 해결하기 위한 것으로서, 사용자에게 일 임상시험 데이터와 매칭되는 다른 임상시험 데이터를 제공하는 것을 일 목적으로 한다.The present invention is to solve the above problems, and an object of the present invention is to provide another clinical trial data that matches one clinical trial data to a user.

또한 본 발명은 복수 개의 플랫폼에 산재해 있는 임상시험 데이터를 수집하여 임상시험 데이터에 대한 데이터베이스를 구축하는 것을 일 목적으로 한다.In addition, an object of the present invention is to build a database for clinical trial data by collecting clinical trial data scattered on a plurality of platforms.

또한 본 발명은 임상시험 데이터의 매칭에 있어서 인덱스 별로 가중치를 부가함으로써 임상시험 데이터 사이의 유사도를 더 정확하게 측정하는 것을 일 목적으로 한다.Another object of the present invention is to more accurately measure the degree of similarity between clinical trial data by adding a weight for each index in matching clinical trial data.

이러한 목적을 달성하기 위한 임상시험 데이터 매칭 장치는 임상시험 데이터가 포함된 적어도 하나의 웹 사이트에서 제1 임상시험 데이터를 수신하는 데이터 수신부, 상기 제1 임상시험 데이터를 분석하여 제1 유효 데이터를 추출한 후 표준화하여 데이터베이스에 저장하는 데이터 추출부, 상기 제1 유효 데이터 및 상기 데이터베이스에 저장된 제 2 유효 데이터 사이의 제1 유사도를 연산하고, 상기 제1 유사도에 서로 다른 가중치를 부여하여 제2 유사도를 산출하는 연산부, 상기 제1 임상시험 데이터 및 상기 제2 유사도를 통해 추출된 제2 임상시험 데이터를 매칭하는 데이터 매칭부 및 상기 제1 임상시험 데이터 및 상기 제2 임상시험 데이터를 표시하는 디스플레이부를 포함한다.A clinical trial data matching device to achieve this object includes a data receiver for receiving first clinical trial data from at least one website containing clinical trial data, analyzing the first clinical trial data and extracting first valid data. Then, a data extractor standardizing and storing the data in a database calculates a first similarity between the first valid data and the second valid data stored in the database, and calculates a second similarity by assigning different weights to the first similarity. and a data matching unit that matches the first clinical trial data and the second clinical trial data extracted through the second similarity, and a display unit that displays the first clinical trial data and the second clinical trial data. .

또한, 이러한 목적을 달성하기 위한 임상시험 데이터 매칭 장치에서 실행되는 임상시험 데이터 매칭 방법은 임상시험 데이터가 포함된 적어도 하나의 웹 사이트에서 제1 임상시험 데이터를 수신하는 단계, 상기 제1 임상시험 데이터를 분석하여 제1 유효 데이터를 추출한 후 표준화하여 데이터베이스에 저장하는 단계, 상기 제1 유효 데이터 및 상기 데이터베이스에 저장된 제 2 유효 데이터 사이의 제1 유사도를 연산하고, 상기 제1 유사도에 서로 다른 가중치를 부여하여 제2 유사도를 산출하는 단계, 상기 제1 임상시험 데이터 및 상기 제2 유사도를 통해 추출된 제2 임상시험 데이터를 매칭하는 단계 및 상기 제1 임상시험 데이터 및 상기 제2 임상시험 데이터를 표시하는 단계를 포함한다.In addition, a clinical trial data matching method executed in a clinical trial data matching device to achieve this object includes receiving first clinical trial data from at least one website including clinical trial data, the first clinical trial data extracting first valid data by analyzing, standardizing and storing in a database, calculating a first similarity between the first valid data and second valid data stored in the database, and assigning different weights to the first similarity. calculating the second similarity, matching the first clinical trial data and the second clinical trial data extracted through the second similarity, and displaying the first clinical trial data and the second clinical trial data It includes steps to

전술한 바와 같은 본 발명에 의하면, 일 임상시험 데이터와 매칭되는 다른 임상시험 데이터를 제공하여 사용자가 보다 다양한 임상시험 데이터를 제공받을 수 있다.According to the present invention as described above, by providing other clinical trial data that matches one clinical trial data, the user can be provided with more diverse clinical trial data.

또한 본 발명은 복수 개의 플랫폼에 산재해 있는 임상시험 데이터를 수집하여 임상시험 데이터에 대한 데이터베이스를 구축할 수 있다.In addition, the present invention can build a database for clinical trial data by collecting clinical trial data scattered on a plurality of platforms.

또한 본 발명은 임상시험 데이터의 매칭에 있어서 인덱스 별로 가중치를 부가함으로써 임상시험 데이터 사이의 유사도를 더 정확하게 측정하여, 임상시험 결과의 매칭 정확도를 향상시킬 수 있다.In addition, the present invention can more accurately measure the degree of similarity between clinical trial data by adding a weight for each index in matching clinical trial data, thereby improving matching accuracy of clinical trial results.

도 1은 본 발명의 일 실시 예에 의한 임상시험 데이터 매칭 장치의 구성을 도시한 구성도,
도 2는 본 발명의 일 실시 예에 의한 임상시험 데이터 매칭 방법의 순서도,
도 3은 본 발명의 일 실시 예에 의한 유사도 연산 방법의 순서도,
도 4는 본 발명의 일 실시 예에 의한 필드 별 가중치를 연산하는 방법의 순서도,
도 5 내지 도 7는 본 발명의 일 실시 예에 의한 디스플레이부에 표시되는 임상시험 데이터를 도시한 도면이다.1 is a configuration diagram showing the configuration of a clinical trial data matching device according to an embodiment of the present invention;
2 is a flowchart of a clinical trial data matching method according to an embodiment of the present invention;
3 is a flowchart of a similarity calculation method according to an embodiment of the present invention;
4 is a flowchart of a method of calculating a weight for each field according to an embodiment of the present invention;
5 to 7 are diagrams illustrating clinical trial data displayed on a display unit according to an embodiment of the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다.The above objects, features and advantages will be described later in detail with reference to the accompanying drawings, and accordingly, those skilled in the art to which the present invention belongs will be able to easily implement the technical spirit of the present invention. In describing the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted.

도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용되며, 명세서 및 특허청구의 범위에 기재된 모든 조합은 임의의 방식으로 조합될 수 있다. 그리고 다른 식으로 규정하지 않는 한, 단수에 대한 언급은 하나 이상을 포함할 수 있고, 단수 표현에 대한 언급은 또한 복수 표현을 포함할 수 있음이 이해되어야 한다.In the drawings, the same reference numerals are used to indicate the same or similar elements, and all combinations described in the specification and claims may be combined in any manner. And unless otherwise specified, it should be understood that references to the singular may include one or more, and references to the singular may also include plural.

본 명세서에서 사용되는 용어는 단지 특정 예시적 실시 예들을 설명할 목적을 가지고 있으며 한정할 의도로 사용되는 것이 아니다. 본 명세서에서 사용된 바와 같은 단수적 표현들은 또한, 해당 문장에서 명확하게 달리 표시하지 않는 한, 복수의 의미를 포함하도록 의도될 수 있다. 용어 "및/또는," "그리고/또는"은 그 관련되어 나열되는 항목들의 모든 조합들 및 어느 하나를 포함한다. 용어 "포함한다", "포함하는", "포함하고 있는", "구비하는", "갖는", "가지고 있는" 등은 내포적 의미를 갖는 바, 이에 따라 이러한 용어들은 그 기재된 특징, 정수, 단계, 동작, 요소, 및/또는 컴포넌트를 특정하며, 하나 이상의 다른 특징, 정수, 단계, 동작, 요소, 컴포넌트, 및/또는 이들의 그룹의 존재 혹은 추가를 배제하지 않는다. 본 명세서에서 설명되는 방법의 단계들, 프로세스들, 동작들은, 구체적으로 그 수행 순서가 확정되는 경우가 아니라면, 이들의 수행을 논의된 혹은 예시된 그러한 특정 순서로 반드시 해야 하는 것으로 해석돼서는 안 된다. 추가적인 혹은 대안적인 단계들이 사용될 수 있음을 또한 이해해야 한다.Terms used herein are only for the purpose of describing specific exemplary embodiments and are not intended to be limiting. Singular expressions as used herein may also be intended to include plural meanings unless the context clearly dictates otherwise. The term “and/or,” “and/or” includes all combinations and any one of the associated listed items. The terms "comprises", "comprising", "including", "including", "having", "having" and the like are meant to be inclusive, and thus such terms shall be construed as having a recited feature, integer, Specifies steps, operations, elements, and/or components, and does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and actions described herein should not be construed as requiring their performance in the specific order discussed or illustrated, unless such order of performance is specifically established. . It should also be understood that additional or alternative steps may be used.

또한, 각각의 구성요소는 각각 하드웨어 프로세서로 구현될 수 있고, 위 구성요소들이 통합되어 하나의 하드웨어 프로세서로 구현될 수 있으며, 또는 위 구성요소들이 서로 조합되어 복수 개의 하드웨어 프로세서로 구현될 수도 있다.In addition, each component may be implemented as a hardware processor, and the above components may be integrated and implemented as one hardware processor, or the above components may be combined with each other and implemented as a plurality of hardware processors.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 의한 임상시험 데이터 매칭 장치의 구성을 도시한 도면이다. 도 1을 참조하면, 임상시험 데이터 매칭 장치는 데이터 수신부(110), 데이터 추출부(130), 연산부(150), 데이터 매칭부(170), 그리고 디스플레이부(190)를 포함할 것이다.1 is a diagram showing the configuration of a clinical trial data matching device according to an embodiment of the present invention. Referring to FIG. 1 , the clinical trial data matching device may include a data receiving unit 110, a data extraction unit 130, a calculation unit 150, a data matching unit 170, and a display unit 190.

데이터 수신부(110)는 임상시험 데이터가 포함된 적어도 하나의 웹 사이트에서 제1 임상시험 데이터를 수신할 수 있다. 보다 구체적으로 데이터 수신부(110)는 식품의약품안전처에서 식약처 임상시험 승인 정보를, CRIS(Clinical Research Information Service)에서 임상연구 등록 정보를, Clinical Trials.gov(A Service of the U.S. National Institutes of Health)에서 글로벌 임상연구 등록 정보를 수신할 수 있다.The data receiving unit 110 may receive first clinical trial data from at least one website including clinical trial data. More specifically, the data receiving unit 110 receives FDA clinical trial approval information from the Ministry of Food and Drug Safety, clinical study registration information from CRIS (Clinical Research Information Service), and Clinical Trials.gov (A Service of the U.S. National Institutes of Health). ) to receive global clinical study registration information.

데이터 수신부(110)는 정기적 혹은 비정기적으로 상기 사이트에 업로드 된 제1 임상시험 데이터를 수신할 수 있다.The data receiving unit 110 may periodically or irregularly receive first clinical trial data uploaded to the site.

데이터 추출부(130)는 제1 임상시험 데이터를 크롤링하여 제1 유효 데이터를 추출하고, 이를 데이터베이스에 저장할 수 있다. 데이터 추출부(130)는 제1 임상시험 데이터를 크롤링하여 데이터에서 타이틀, 시험 단계, 의약품, 피험자 성별, 나이, 시험 방식, 시험 도구 또는 생체조직 정보를 포함하는 제1 유효 데이터를 추출할 수 있다.The data extraction unit 130 may crawl the first clinical trial data to extract first valid data and store the first valid data in a database. The data extraction unit 130 may crawl the first clinical trial data and extract first valid data including title, trial stage, drug, subject sex, age, test method, test tool, or biological tissue information from the data. .

예를 들어, 데이터 추출부(130)는 시험 단계에 대한 유효 데이터를 추출함에 있어서 시험 단계의 값이 존재하지 않으면 생동|연구자 임상시험을 유효 데이터로 하고, 'Phase 3'의 값이 존재하면 문자열을 제거하고 숫자만 추출하여 3을 유효 데이터로 할 수 있다. 또한 이 외의 다른 유효 데이터의 경우 값이 없으면 None을 설정할 수 있다.For example, in extracting the valid data for the test phase, the data extractor 130 sets the biologic|researcher clinical trial as valid data if the value of the test phase does not exist, and if the value of 'Phase 3' exists, a character string can be removed and only numbers can be extracted to make 3 valid data. Also, for valid data other than this, None can be set if there is no value.

데이터 추출부(130)는 임상시험 데이터 관리를 효율적으로 수행하기 위하여, 제1 유효 데이터를 표준화할 수 있다. 데이터 추출부(130)는 기 생성된 임상시험 용어 데이터베이스를 이용하여 제1 유효 데이터를 표준화할 수 있다. 임상시험 용어 데이터베이스는 임상시험 용어에 대한 한/영 변환 정보, 동의어, 약어, 주요 키워드 등을 포함하여 동일한 의미를 갖는 다른 용어를 균일화시켜 임상시험 데이터의 통일성을 향상시킬 수 있다.The data extraction unit 130 may standardize the first valid data in order to efficiently manage clinical trial data. The data extraction unit 130 may standardize the first valid data using a pre-generated database of clinical trial terms. The database of clinical trial terminology can improve the uniformity of clinical trial data by standardizing different terms having the same meaning, including Korean/English conversion information, synonyms, abbreviations, and key keywords for clinical trial terms.

또한 임상시험 데이터, 논문, 특허와 같이 다양한 데이터에 명시되는 기관, 연구자, 소속, 그리고 연락처가 표준화되어 있지 않아 데이터 추출부(130)는 현재 공개되어 있는 기관명, 연구자 정보가 저장된 데이터베이스를 이용하여 기관, 연구자, 소속, 그리고 연락처를 표준화할 수 있다. In addition, since institutions, researchers, affiliations, and contact information specified in various data such as clinical trial data, theses, and patents are not standardized, the data extraction unit 130 uses a database in which currently open institution names and researcher information are stored. , researchers, affiliations, and contacts can be standardized.

데이터 추출부(130)는 타이틀, 시험 단계, 의약품, 피험자 성별, 나이, 시험 방식, 시험 도구, 생체조직 정보를 필드로 하는 인덱스 테이블을 생성하고, 제1 유효 데이터를 인덱싱하여 인덱스 테이블을 업데이트하며, 이를 데이터베이스에 저장할 것이다. 이 때 인덱스 테이블은 데이터베이스에 저장되는 데이터 관리를 효율적으로 하기 위한 것으로 기본적으로 데이터베이스에 저장된다. 데이터 추출부(130)는 제1 유효 데이터에 대한 인덱스 테이블을 업데이트할 때, 매칭 상태를 더 저장할 수 있다. 매칭 상태는 매칭 필요 상태, 매칭 완료 상태, 매칭 없음 상태를 포함할 수 있다.The data extraction unit 130 creates an index table having the title, test stage, drug, subject gender, age, test method, test tool, and biological tissue information as fields, indexes the first valid data, and updates the index table. , which will store it in the database. At this time, the index table is basically stored in the database to efficiently manage the data stored in the database. When updating the index table for the first valid data, the data extraction unit 130 may further store a matching state. The matching state may include a matching required state, a matching complete state, and no matching state.

연산부(150)는 제1 유효 데이터와 데이터베이스에 기 저장된 다른 제2 유효 데이터 사이의 제1 유사도를 연산할 수 있다. 연산부(150)는 유사도를 연산하기 위해 먼저 유효 데이터를 자연어 처리할 수 있다. 자연어 처리는 컴퓨터와 같은 기계가 인간의 언어를 분석함으로써 해석하고 모사하는 기술로, 단어들을 DTM, Word2Vec 등과 같은 방법으로 수치화하여 표현하고, 단어 간의 차이를 유클리드 거리, 코사인 유사도 등으로 연산하여 유사도를 연산할 수 있다. The calculation unit 150 may calculate a first similarity between the first valid data and other second valid data pre-stored in the database. The calculator 150 may first process valid data in natural language in order to calculate the degree of similarity. Natural language processing is a technology in which a machine such as a computer analyzes and simulates human language. Words are digitized and expressed by methods such as DTM and Word2Vec, and the difference between words is calculated with Euclidean distance, cosine similarity, etc. to calculate the degree of similarity. can be computed.

연산부(150)는 제1 유효 데이터와 제2 유효 데이터의 필드 값에 대한 제1 유사도를 연산할 수 있다. 연산부(150)는 타이틀, 시험 단계, 의약품, 피험자 성별, 나이, 시험 방식, 시험 도구, 생체조직 정보에 대한 필드 값에 대한 제1 유사도를 각각 연산할 것이다. 연산부(150)는 제1 유사도를 연산함에 있어서 종래의 방법을 따른다.The calculation unit 150 may calculate a first similarity between field values of the first valid data and the second valid data. The calculation unit 150 will calculate the first similarity of the field values for the title, test stage, medicine, test subject's gender, age, test method, test tool, and biotissue information, respectively. The calculation unit 150 follows a conventional method in calculating the first degree of similarity.

연산부(150)는 제1 및 제2 유효 데이터에서 필드 값 별로 연산된 제1 유사도에 필드 별 가중치를 적용하여 제2 유사도를 산출할 수 있다. 가중치는 임상시험 데이터의 유사도를 결정하는 데 중요한 필드 값에서 높게 설정될 것이다. 예를 들어 임상시험 데이터의 타이틀은 특정 포맷을 기준으로 작성되어 타이틀이 포함하는 정보의 종류가 명확하며, 상기 정보는 임상시험에 대한 정보가 축약된 것이기 때문에 타이틀에 대한 가중치가 가장 크게 설정될 수 있다.The calculation unit 150 may calculate the second similarity by applying a weight for each field to the first similarity calculated for each field value in the first and second valid data. Weights will be set high for field values that are important in determining the similarity of clinical trial data. For example, the title of clinical trial data is prepared based on a specific format, so the type of information included in the title is clear, and since the information is an abbreviation of information about the clinical trial, the weight for the title can be set the highest. there is.

연산부(150)는 가중치로 관리자가 기 설정한 값을 이용할 수도 있고, 강화학습을 통해 가중치를 설정할 수도 있다. 강화학습은 특정 환경 내에서 정의된 에이전트(Agent)가 현재의 상태(State)를 인식하여, 선택 가능한 행동(Action) 중 보상을 최대화하는 행동 또는 행동 순서를 선택하는 방법이다.The calculation unit 150 may use values previously set by a manager as weights, or may set weights through reinforcement learning. Reinforcement learning is a method in which an agent defined in a specific environment recognizes the current state and selects an action or action sequence that maximizes rewards among selectable actions.

연산부(150)는 데이터베이스에 저장된 이미 매칭된 임상시험 데이터에 대응하는 유효 데이터의 필드 값에 대한 제1 유사도와 해당 임상시험 데이터의 매칭 과정에 있어서 사용자의 검증 정보를 이용하여 가중치 설정 환경을 생성할 수 있다.The calculation unit 150 generates a weight setting environment by using the user's verification information in the process of matching the clinical trial data with the first similarity of the field value of the valid data corresponding to the already matched clinical trial data stored in the database. can

연산부(150)는 제1 임상시험 데이터에 대응하는 유효 데이터의 필드 값의 제1 유사도를 현재 상태(State)로, 각 필드 값에 대한 가중치를 행동(Action)으로 하여, 에이전트가 필드 값에 대하여 보상을 최대로 하는 가중치를 선택했을 때, 제1 유사도에 가중치를 적용하여 제2 유사도를 연산한 결과, 가장 높은 유사도 또는 기 설정된 임계 값 이상의 값을 갖는 유사도에 대응하는 적어도 하나의 제2 임상시험 데이터에 대한 사용자의 검증 결과를 이용하여 에이전트에게 보상을 제공할 수 있다.The calculation unit 150 sets the first similarity of the field values of the valid data corresponding to the first clinical trial data as the current state (State) and the weight for each field value as the action (Action), so that the agent can determine the field value When a weight that maximizes compensation is selected, as a result of calculating the second similarity by applying a weight to the first similarity, at least one second clinical trial corresponding to the highest similarity or a similarity having a value equal to or greater than a preset threshold A reward can be provided to the agent using the user's verification result for the data.

연산부(150)는 상기 과정을 수행하기 위해 강화학습 모델을 생성하는 모델 생성부(151), 행동 선택부(153), 그리고 보상 제공부(155)를 포함할 수 있다.The calculation unit 150 may include a model generation unit 151 that generates a reinforcement learning model to perform the above process, an action selection unit 153, and a reward providing unit 155.

모델 생성부(151)는 데이터베이스에 저장된 임상시험 데이터에 대응하는 적어도 하나의 유효 데이터의 필드 값에 대한 제1 유사도를 이용하여 강화학습 모델을 생성할 수 있다. 모델 생성부(151)는 유효 데이터의 필드 값에 대한 제1 유사도에 필드 별 가중치를 적용하여 생성되는 제2 유사도에 따라, 기준이 되는 제1 임상시험 데이터와 가장 높은 유사도를 갖거나 기 설정된 임계 값 이상의 유사도를 갖는 제2 임상시험 데이터에 대한 사용자의 검증 결과에 따른 보상을 통해 강화학습 모델을 생성할 수 있다. 나아가 모델 생성부(151)는 행동 선택부(153)가 보상을 최대로 하는 가중치를 설정할 수 있도록 반복적인 학습을 수행하여 강화학습 모델을 강화할 수 있다.The model generating unit 151 may generate a reinforcement learning model using a first similarity to a field value of at least one valid data corresponding to the clinical trial data stored in the database. The model generation unit 151 has the highest similarity with the first clinical trial data as a reference or a preset threshold according to the second similarity generated by applying a weight for each field to the first similarity with respect to the field values of the valid data. A reinforcement learning model may be created through compensation according to the user's verification result for the second clinical trial data having a similarity greater than or equal to the value. Furthermore, the model generator 151 may reinforce the reinforcement learning model by performing repetitive learning so that the action selector 153 can set a weight that maximizes a reward.

행동 선택부(153)는 보상을 최대로 하는 필드 별 가중치를 설정할 수 있다.The action selector 153 may set a weight for each field that maximizes a reward.

보상 제공부(155)는 행동 선택부(153)에서 제1 임상시험 데이터의 제1 유효 데이터의 필드 값의 제1 유사도에, 설정된 필드 별 가중치를 반영하여 제2 유사도를 연산하고, 제2 유사도를 통해 추출된 적어도 하나의 제2 임상시험 데이터에 대한 사용자의 검증 결과를 이용하여 보상을 산정할 수 있다. 따라서 보상 제공부(155)는 사용자의 피드백을 기반으로 보상을 산정할 것이다.The reward providing unit 155 calculates the second similarity by reflecting the set weight for each field to the first similarity of the field value of the first valid data of the first clinical trial data in the action selection unit 153, and calculates the second similarity. Compensation may be calculated using the user's verification result for the at least one second clinical trial data extracted through . Accordingly, the reward providing unit 155 will calculate the reward based on the user's feedback.

데이터 매칭부(170)는 제1 임상시험 데이터와 유사하다고 판단된 제2 임상시험 데이터를 매칭할 수 있다. 데이터 매칭부(170)는 제1 및 제2 임상시험 데이터의 매칭 상태를 완료 상태로 변경할 것이다.The data matching unit 170 may match the second clinical trial data determined to be similar to the first clinical trial data. The data matching unit 170 will change the matching state of the first and second clinical trial data to a complete state.

디스플레이부(190)는 제1 임상시험 데이터, 그리고 제1 임상시험 데이터와 매칭된 적어도 하나의 제2 임상시험 데이터를 화면에 표시할 것이다. 디스플레이부(190)는 제1 및 제2 임상시험 데이터를 도 5 내지 도 7과 같이 표시할 수 있다.The display unit 190 will display the first clinical trial data and at least one second clinical trial data matched with the first clinical trial data on the screen. The display unit 190 may display the first and second clinical trial data as shown in FIGS. 5 to 7 .

이하에서는 도 2 내지 도 4를 이용하여, 임상시험 데이터 매칭 방법을 설명한다. 임상시험 데이터 매칭 방법에 관한 설명에 있어서 전술한 임상시험 데이터 매칭 장치와 중복되는 세부 실시 예는 생략될 수 있다. 또한 임상시험 데이터 매칭 방법의 주체인 임상시험 데이터 매칭 장치는 서버로 구현될 수 있는 바, 이하에서는 설명의 편의를 위하여 서버로 명명한다.Hereinafter, a clinical trial data matching method will be described using FIGS. 2 to 4 . In the description of the clinical trial data matching method, detailed embodiments overlapping with the aforementioned clinical trial data matching device may be omitted. In addition, the clinical trial data matching device, which is the subject of the clinical trial data matching method, may be implemented as a server, and hereinafter referred to as a server for convenience of explanation.

단계 100에서, 서버는 임상시험 데이터가 포함된 적어도 하나의 웹 사이트에서 제1 임상시험 데이터를 수신할 수 있다. 보다 구체적으로 데이터 수신부(110)는 식품의약품안전처에서 식약처 임상시험 승인 정보를, CRIS(Clinical Research Information Service)에서 임상연구 등록 정보를, Clinical Trials.gov(A Service of the U.S. National Institutes of Health)에서 글로벌 임상연구 등록 정보를 수신할 수 있다.In step 100, the server may receive first clinical trial data from at least one website including clinical trial data. More specifically, the data receiving unit 110 receives FDA clinical trial approval information from the Ministry of Food and Drug Safety, clinical study registration information from CRIS (Clinical Research Information Service), and Clinical Trials.gov (A Service of the U.S. National Institutes of Health). ) to receive global clinical study registration information.

단계 200에서, 서버는 제1 임상시험 데이터를 크롤링하여 제1 유효 데이터를 추출하고, 이를 데이터베이스에 저장할 수 있다. 서버는 제1 임상시험 데이터를 크롤링하여 데이터에서 타이틀, 시험 단계, 의약품, 피험자 성별, 나이, 시험 방식, 시험 도구 또는 생체조직 정보를 포함하는 제1 유효 데이터를 추출할 수 있다.In step 200, the server may crawl the first clinical trial data to extract the first valid data and store it in the database. The server may crawl the first clinical trial data and extract first valid data including title, trial stage, drug, subject sex, age, trial method, test instrument, or biological tissue information from the data.

또한 서버는 임상시험 데이터 관리를 효율적으로 수행하기 위하여, 제1 유효 데이터를 표준화할 수 있다. 서버는 기 생성된 임상시험 용어 데이터베이스를 이용하여 제1 유효 데이터를 표준화할 수 있다. 임상시험 용어 데이터베이스는 임상시험 용어에 대한 한/영 변환 정보, 동의어, 약어, 주요 키워드 등을 포함하여 동일한 의미를 갖는 다른 용어를 균일화시켜 임상시험 데이터의 통일성을 향상시킬 수 있다.Also, the server may standardize the first valid data in order to efficiently manage clinical trial data. The server may standardize the first valid data by using a pre-generated clinical trial term database. The database of clinical trial terminology can improve the uniformity of clinical trial data by standardizing different terms having the same meaning, including Korean/English conversion information, synonyms, abbreviations, and key keywords for clinical trial terms.

또한 임상시험 데이터, 논문, 특허와 같이 다양한 데이터에 명시되는 기관, 연구자, 소속, 그리고 연락처가 표준화되어 있지 않아 서버는 현재 공개되어 있는 기관명, 연구자 정보가 저장된 데이터베이스를 이용하여 기관, 연구자, 소속, 연락처를 표준화할 수 있다. In addition, since the institution, researcher, affiliation, and contact information specified in various data such as clinical trial data, theses, and patents are not standardized, the server uses a database that stores institution name and researcher information that is currently open to the public. Contacts can be standardized.

서버는 타이틀, 시험 단계, 의약품, 피험자 성별, 나이, 시험 방식, 시험 도구, 생체조직 정보를 필드로 하는 인덱스 테이블을 생성하고, 제1 유효 데이터를 인덱싱하여 인덱스 테이블을 업데이트하며, 이를 데이터베이스에 저장할 것이다. 이 때 인덱스 테이블은 데이터베이스에 저장되는 데이터 관리를 효율적으로 하기 위한 것으로 기본적으로 데이터베이스에 저장된다. The server creates an index table having the title, test stage, drug, subject gender, age, test method, test tool, and biological tissue information as fields, indexes the first valid data to update the index table, and stores the index table in the database. will be. At this time, the index table is basically stored in the database to efficiently manage the data stored in the database.

나아가 서버는 제1 유효 데이터에 대한 인덱스 테이블을 업데이트할 때, 매칭 상태를 더 저장할 수 있다. 매칭 상태는 매칭 필요 상태, 매칭 완료 상태, 매칭 없음 상태를 포함할 수 있다.Furthermore, when updating the index table for the first valid data, the server may further store a matching state. The matching state may include a matching required state, a matching complete state, and no matching state.

단계 300에서, 서버는 제1 유효 데이터와 데이터베이스에 기 저장된 다른 제2 유효 데이터 사이의 제1 유사도를 연산할 수 있다. 서버는 제1 유사도를 연산하기 위해 먼저 유효 데이터를 자연어 처리(S310)할 수 있다.In step 300, the server may calculate a first similarity between the first valid data and other second valid data pre-stored in the database. The server may first natural language process the valid data to calculate the first similarity (S310).

단계 320에서 서버는 제1 유효 데이터와 제2 유효 데이터의 필드 값에 대한 제1 유사도를 연산할 수 있다. 서버는 타이틀, 시험 단계, 의약품, 피험자 성별, 나이, 시험 방식, 시험 도구, 생체조직 정보에 대한 필드 값에 대한 유사도를 각각 연산할 것이다. 서버는 유사도를 연산함에 있어서 종래의 방법을 따를 수 있다.In step 320, the server may calculate a first similarity between field values of the first valid data and the second valid data. The server will calculate the similarity of the field values for the title, test stage, drug, subject's gender, age, test method, test tool, and biological tissue information, respectively. The server may follow a conventional method in calculating the degree of similarity.

단계 330에서 서버는 제1 및 제2 유효 데이터에서 필드 값 별로 연산된 제1 유사도에 필드 별 가중치를 적용하여 제2 유사도를 산출할 수 있다. 가중치는 임상시험 데이터의 유사도를 결정하는 데 중요한 필드 값에서 높게 설정될 것이다.In step 330, the server may calculate a second similarity by applying a weight for each field to the first similarity calculated for each field value in the first and second valid data. Weights will be set high for field values that are important in determining the similarity of clinical trial data.

단계 331에서 서버는 데이터베이스에 저장된 이미 매칭된 임상시험 데이터에 대응하는 유효 데이터의 필드 값에 대한 제1 유사도와 해당 임상시험 데이터의 매칭 과정에 있어서의 사용자의 검증 정보를 이용하여 가중치 설정 환경을 생성할 수 있다.In step 331, the server generates a weight setting environment by using the first similarity of field values of valid data corresponding to previously matched clinical trial data stored in the database and the user's verification information in the matching process of the corresponding clinical trial data. can do.

단계 333에서, 서버는 데이터베이스에 저장된 임상시험 데이터에 대응하는 적어도 하나의 유효 데이터의 필드 값에 대한 제1 유사도를 이용하여 강화학습 모델을 생성할 수 있다. 서버는 유효 데이터의 필드 값에 대한 제1 유사도에 필드 별 가중치를 적용했을 경우 산출되는 제2 유사도를 이용하여, 기준이 되는 제1 임상시험 데이터와 가장 높은 유사도를 갖거나 기 설정된 임계 값 이상의 유사도를 갖는 제2 임상시험 데이터에 대한 사용자의 검증 결과에 따른 보상을 통해 강화학습 모델을 생성할 수 있다. In step 333, the server may generate a reinforcement learning model using a first similarity to a field value of at least one valid data corresponding to the clinical trial data stored in the database. The server uses the second similarity calculated when a field-specific weight is applied to the first similarity for the field values of the valid data, and has the highest similarity with the first clinical trial data as the reference or similarity that is greater than or equal to a preset threshold. A reinforcement learning model may be created through compensation according to the user's verification result for the second clinical trial data having .

단계 335에서, 서버는 보상을 최대로 하는 필드 별 가중치를 설정할 수 있다.In step 335, the server may set a weight for each field that maximizes compensation.

단계 337에서, 서버는 제1 임상시험 데이터의 제1 유효 데이터의 필드 값의 제1 유사도에, 설정된 필드 별 가중치를 반영하여 제2 유사도를 연산하고, 제2 유사도를 통해 추출된 적어도 하나의 제2 임상시험 데이터에 대한 사용자의 검증 결과를 이용하여 보상을 산정할 수 있다.In step 337, the server computes a second similarity by reflecting set weights for each field to the first similarity of field values of the first valid data of the first clinical trial data, and calculates at least one first similarity extracted through the second similarity. 2 Compensation can be calculated using the user's verification results for clinical trial data.

단계 331 내지 337을 반복하여, 서버는 가장 높은 보상을 받을 수 있는 필드 별 가중치를 설정하여 제2 유사도를 연산할 수 있다.By repeating steps 331 to 337, the server may calculate the second similarity by setting a weight for each field that can receive the highest reward.

단계 400에서, 서버는 제1 임상시험 데이터와 유사하다고 판단된 제2 임상시험 데이터를 매칭하고, 제1 및 제2 임상시험 데이터의 매칭 상태를 완료 상태로 변경(S500)할 것이다. 서버는 제1 임상시험 데이터와 매칭되는 제2 임상시험 데이터가 존재하면, 제1 및 제2 임상시험 데이터의 매칭 상태를 매칭 완료 상태로 변경하고, 제2 임상시험 데이터가 존재하지 않으면 제1 임상시험 데이터의 매칭 상태를 매칭 없음 상태로 변경할 것이다.In step 400, the server will match the second clinical trial data determined to be similar to the first clinical trial data, and change the matching status of the first and second clinical trial data to a complete status (S500). If the second clinical trial data matching the first clinical trial data exists, the server changes the matching status of the first and second clinical trial data to a matching complete state, and if the second clinical trial data does not exist, the server changes the matching status of the first clinical trial data to the first clinical trial data. We will change the matching state of the test data to no matching state.

서버는 매칭 완료 상태의 제1 및 제2 임상시험 데이터에 대한 검증 요청 신호를 적어도 하나의 관리자 단말에 전송하고, 관리자 단말에 제1 및 제2 임상시험 데이터와 매칭 확정/매칭 취소 등의 버튼을 표시하여 사용자가 검증 결과를 보다 용이하게 선택할 수 있도록 한다.The server transmits a verification request signal for the first and second clinical trial data in a matching complete state to at least one manager terminal, and presses buttons such as confirmation/cancellation of matching with the first and second clinical trial data to the manager terminal. It is displayed so that the user can select the verification result more easily.

나아가 서버는 사용자 단말로부터 추가 메모 정보를 수신하는 경우, 제1 및 제2 임상시험 데이터에 이를 더할 수 있다.Furthermore, when receiving additional memo information from the user terminal, the server may add it to the first and second clinical trial data.

단계 600에서, 서버는 제1 임상시험 데이터, 그리고 제1 임상시험 데이터와 매칭된 적어도 하나의 제2 임상시험 데이터를 화면에 표시할 것이다.In step 600, the server displays the first clinical trial data and at least one second clinical trial data matched with the first clinical trial data on the screen.

본 명세서와 도면에 개시된 본 발명의 실시 예들은 본 발명의 기술 내용을 쉽게 설명하고 본 발명의 이해를 돕기 위해 특정 예를 제시한 것뿐이며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시 예들 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다.The embodiments of the present invention disclosed in the present specification and drawings are only presented as specific examples to easily explain the technical content of the present invention and help understanding of the present invention, and are not intended to limit the scope of the present invention. It is obvious to those skilled in the art that other modified examples based on the technical spirit of the present invention can be implemented in addition to the embodiments disclosed herein.

110: 데이터 수신부,
130: 데이터 추출부,
150: 연산부,
170: 데이터 매칭부,
190: 디스플레이부110: data receiver,
130: data extraction unit,
150: calculation unit,
170: data matching unit,
190: display unit

Claims

In the clinical trial data matching device,
a data receiving unit receiving first clinical trial data from at least one website including clinical trial data;
a data extraction unit that analyzes the first clinical trial data, extracts first valid data, standardizes it, and stores it in a database;
A first similarity for field values of the first valid data and second valid data stored in the database is calculated, and a weight for each field is applied to the first similarity calculated for each field value in the first valid data and the second valid data. Calculate the second similarity, set the first similarity of the field value of the valid data corresponding to the first clinical trial data as the current state (State), and set the weight for each field value as the action (Action), so that the agent calculates the field value When a weight that maximizes the compensation is selected, as a result of calculating the second similarity by applying a weight to the first similarity, at least one second similarity corresponding to the highest similarity or a similarity having a value equal to or greater than a predetermined threshold value is obtained. a calculation unit for providing a reward to an agent using a user's verification result for clinical trial data;
a data matching unit matching the first clinical trial data with the second clinical trial data extracted through the second similarity;
A display unit displaying the first clinical trial data and the second clinical trial data;
the calculation unit
The highest similarity with the first clinical trial data that is the standard according to the second similarity generated by applying a weight for each field to the first similarity for the field value of at least one valid data corresponding to the clinical trial data stored in the database a model generating unit generating a reinforcement learning model through compensation according to a user's verification result for the second clinical trial data having similarity or having a similarity equal to or higher than a preset threshold value;
an action selection unit that sets a weight for each field that maximizes a reward; and
In the action selector, the second similarity is calculated by reflecting the weight set for each field to the first similarity of field values of the first valid data of the first clinical trial data, and at least one second value extracted through the second similarity is calculated. Characterized in that it comprises a compensation providing unit for calculating compensation using the user's verification results for clinical trial data
Clinical trial data matching device.

In the clinical trial data matching method executed in the clinical trial data matching device,
Receiving first clinical trial data from at least one website including clinical trial data;
analyzing the first clinical trial data, extracting first valid data, standardizing the data, and storing the data in a database;
calculating a first similarity between the first valid data and second valid data stored in the database, and assigning different weights to the first similarity to calculate a second similarity;
matching the first clinical trial data and the second clinical trial data extracted through the second similarity; and
Displaying the first clinical trial data and the second clinical trial data;
Calculating a first similarity between the first valid data and second valid data stored in the database, and calculating a second similarity by assigning different weights to the first similarity
A first similarity for field values of the first valid data and second valid data stored in the database is calculated, and a weight for each field is applied to the first similarity calculated for each field value in the first valid data and the second valid data. calculating a second degree of similarity; and
The first similarity of the field values of the valid data corresponding to the first clinical trial data is set as the current state, and the weight for each field value is set as the action, so that the agent maximizes compensation for the field value When the weight is selected, as a result of calculating the second similarity by applying the weight to the first similarity, the user's response to at least one second clinical trial data corresponding to the highest similarity or a similarity having a value equal to or greater than a predetermined threshold value Including the step of providing a reward to the agent using the verification result,
The highest similarity with the first clinical trial data that is the standard according to the second similarity generated by applying a weight for each field to the first similarity for the field value of at least one valid data corresponding to the clinical trial data stored in the database Generating a reinforcement learning model through compensation according to a user's verification result for second clinical trial data having or having a similarity greater than or equal to a preset threshold value;
Setting a weight for each field that maximizes compensation; and
The second similarity is calculated by reflecting the set weight for each field in the first similarity of the field values of the first valid data of the first clinical trial data, and the second similarity is applied to at least one second clinical trial data extracted through the similarity. Comprising the step of calculating compensation using the user's verification result for
Methods for matching clinical trial data.