KR20220091055A

KR20220091055A - Apparatus and method for selecting the main eligibility criteria to increase the efficiency of clinical trial feasibility assessment

Info

Publication number: KR20220091055A
Application number: KR1020200182180A
Authority: KR
Inventors: 이형기; 전유민
Original assignee: 서울대학교병원
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2022-06-30
Also published as: KR102625820B1

Abstract

일 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치는, 대상 질병과 관련된 임상시험의 대상자 선정기준과 상기 대상 질병과 관련된 환자의 임상정보를 수집하는 데이터 수집부; 및 상기 수집된 대상자 선정기준 및 상기 수집된 환자 임상정보를 기반으로 특징 선택 모델을 생성하고, 상기 생성된 특징 선택 모델을 이용하여 상기 대상 질병과 관련된 임상시험 실시가능성 판단에 이용될 주요 대상자 선정기준을 선택하는 프로세서; 를 포함할 수 있다.The apparatus for selecting a main subject selection criterion for improving the efficiency of determining the feasibility of a clinical trial according to an embodiment includes a data collection unit that collects the subject selection criterion for a clinical trial related to a target disease and clinical information of a patient related to the target disease ; and generating a feature selection model based on the collected subject selection criteria and the collected patient clinical information, and using the generated feature selection model, main subject selection criteria to be used in determining the feasibility of a clinical trial related to the target disease Select a processor; may include

Description

Apparatus and method for selecting the main eligibility criteria to increase the efficiency of clinical trial feasibility assessment

임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치 및 방법과 관련된다.It is related to the device and method for selecting the main subject selection criteria to improve the efficiency of judging the feasibility of clinical trials.

신약 개발 비용 절감 및 성공률 제고의 선결 조건은 임상시험의 효율적 수행이다. 임상시험은 신약 개발에서 가장 중요하고 가장 많은 비용이 드는 단계로, 개발 중인 치료제에 잘 반응할 환자를 선택해 이들을 빨리 임상시험에 등재시켜야 신약 개발의 효율성과 성공 가능성이 높아진다. 따라서 임상시험 실시가능성(feasibility)의 사전 검토가 중요하며, 이는 임상시험 실시 전에 대상자 선정기준(eligibility criteria, EC)를 만족하는 환자가 대략 얼마나 있는지 확인하고 대상자 선정기준의 어떠한 조건이 환자 등재를 더디게 할 수 있는지 확인하여 이를 수정하는 과정이다.Efficient execution of clinical trials is a prerequisite for reducing the cost of developing new drugs and increasing the success rate. Clinical trials are the most important and most expensive stage of new drug development, and when patients who will respond well to the treatment under development are selected and registered in clinical trials as soon as possible, the efficiency and success of new drug development will increase. Therefore, it is important to review the feasibility of the clinical trial in advance, and it is important to check how many patients satisfy the eligibility criteria (EC) before conducting the clinical trial, and what conditions of the selection criteria may delay patient enrollment. It is the process of checking if it can be done and correcting it.

전통적인 임상시험 실시가능성의 판단은 실제 자료의 뒷받침이 없이 연구자의 기억에만 의존하기 때문에 대부분의 연구자가 실제로 등재할 수 있는 환자 수보다 부풀려서 대답해 효율성과 정확성이 떨어진다. 전자 의무 기록, 처방전 데이터베이스 또는 보험 급여 신청 데이터베이스를 이용해 임상시험 실시가능성을 판단하는 경우, 전통적인 임상시험 실시가능성 판단보다는 낫지만 필요 정보가 결측치일 경우 여전히 근거를 제시해 줄 수 있는 자료 확보가 어렵다. 또한, 규칙 기반으로 근거 중심의 임상시험 실시가능성(Evidence-based Feasibility, EbF)을 수행하면 자료의 완성도가 낮거나, 구조화되지 않은 자료처럼 정보를 검색하기 어렵거나, 대상자 선정기준을 병원의 환자 임상정보와 매핑할 수 없는 상황에서는 등재 가능한 환자 수가 매우 적게 추정되므로 비효율적이고 정확성도 떨어진다.Since the judgment of feasibility of a traditional clinical trial depends only on the researcher's memory without the support of actual data, most researchers give an inflated answer than the number of patients that can actually be enrolled, which reduces efficiency and accuracy. Using electronic medical records, prescription databases, or insurance benefit application databases to judge clinical trial feasibility is better than traditional clinical trial feasibility, but it is still difficult to obtain data that can provide evidence when necessary information is missing values. In addition, if Evidence-based Feasibility (EbF) is conducted based on rules, the data quality is low, it is difficult to search for information like unstructured data, In situations where information cannot be mapped, it is inefficient and less accurate as the number of eligible patients is estimated to be very small.

공개특허공보 제10-2019-0134315호Laid-open Patent Publication No. 10-2019-0134315

임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치 및 방법을 제공하는 것을 목적으로 한다.The purpose of this study is to provide a device and method for selecting major subject selection criteria to improve the effectiveness of clinical trial feasibility determination.

일 양상에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치는, 대상 질병과 관련된 임상시험의 대상자 선정기준과 상기 대상 질병과 관련된 환자의 임상정보를 수집하는 데이터 수집부; 및 상기 수집된 대상자 선정기준 및 상기 수집된 환자 임상정보를 기반으로 특징 선택 모델을 생성하고, 상기 생성된 특징 선택 모델을 이용하여 상기 대상 질병과 관련된 임상시험 실시가능성 판단에 이용될 주요 대상자 선정기준을 선택하는 프로세서; 를 포함할 수 있다.The apparatus for selecting a main subject selection criterion for improving the effectiveness of judging the feasibility of a clinical trial according to an aspect includes: a data collection unit for collecting subject selection criteria for a clinical trial related to a target disease and clinical information of a patient related to the target disease; and generating a feature selection model based on the collected subject selection criteria and the collected patient clinical information, and using the generated feature selection model, main subject selection criteria to be used in determining the feasibility of a clinical trial related to the target disease Select a processor; may include

상기 프로세서는, 상기 수집된 대상자 선정기준 및 상기 수집된 환자 임상정보를 기반으로 학습 데이터를 생성하는 학습 데이터 생성부; 상기 생성된 학습 데이터를 기반으로 머신러닝 또는 딥러닝 모델을 학습시켜 특징 선택 모델을 생성하는 특징 선택 모델 생성부; 및 상기 생성된 특징 선택 모델을 이용하여 상기 대상자 선정기준에서 추출된 대상자 선정기준 특징 중에서 상기 주요 대상자 선정기준을 선택하는 특징 선택부; 를 포함할 수 있다.The processor may include: a learning data generator for generating learning data based on the collected subject selection criteria and the collected patient clinical information; a feature selection model generator for generating a feature selection model by learning a machine learning or deep learning model based on the generated training data; and a feature selection unit for selecting the main subject selection criterion from among the subject selection criterion features extracted from the subject selection criterion using the generated feature selection model; may include

상기 학습 데이터 생성부는 임상시험별로 대상자 선정기준에서 계량화가 가능한 대상자 선정기준 특징을 추출하고, 상기 추출된 대상자 선정기준 특징과 각 환자의 임상정보를 매핑하여 매핑 데이터를 생성하고, 상기 추출된 대상자 선정기준 특징, 그 대상자 선정기준 특징의 조건 및 각 환자의 임상정보를 기반으로 임상시험별로 각 환자의 적격성을 판단하여 상기 매핑 데이터에 라벨링함으로써, 상기 학습 데이터를 생성할 수 있다.The learning data generation unit extracts quantifiable subject selection criteria characteristics from the subject selection criteria for each clinical trial, maps the extracted subject selection criteria characteristics and clinical information of each patient to generate mapping data, and selects the extracted subjects The learning data can be generated by judging the eligibility of each patient for each clinical trial and labeling the mapping data based on the criteria characteristics, the conditions of the subject selection criteria characteristics, and the clinical information of each patient.

상기 학습 데이터 생성부는 자연어 처리 기법 또는 결측치 처리 기법을 이용하여 상기 수집된 대상자 선정기준 및 상기 수집된 환자 임상정보를 전처리할 수 있다.The learning data generator may pre-process the collected subject selection criteria and the collected patient clinical information using a natural language processing technique or a missing value processing technique.

상기 학습 데이터 생성부는 스케일링 기법 또는 적층 오토인코더를 이용하여 상기 생성된 학습 데이터를 전처리할 수 있다.The training data generator may pre-process the generated training data using a scaling technique or a stacked autoencoder.

상기 머신러닝 모델은 라쏘 회귀 및 랜덤 포레스트를 포함하고, 딥러닝 모델은 신경망을 포함할 수 있다.The machine learning model may include Lasso regression and random forest, and the deep learning model may include a neural network.

상기 특징 선택 모델 생성부는 각 임상시험의 대상자 선정기준에서 추출된 대상자 선정기준 특징과 상기 추출된 대상자 선정기준 특징에 매핑된 환자 임상정보를 기반으로, 각 임상시험에 대한 환자의 적격성(eligibility)을 예측하도록 상기 머신러닝 또는 딥러닝 모델을 학습시킬 수 있다.The feature selection model generation unit calculates the patient's eligibility for each clinical trial based on the subject selection criterion characteristics extracted from the subject selection criteria of each clinical trial and the patient clinical information mapped to the extracted subject selection criterion characteristics. The machine learning or deep learning model can be trained to make predictions.

상기 특징 선택부는 상기 생성된 특징 선택 모델로부터 상기 대상자 선정기준에서 추출된 대상자 선정기준 특징의 중요도를 판단하고, 상기 판단된 중요도를 기반으로 상기 주요 대상자 선정기준을 선택할 수 있다.The feature selection unit may determine the importance of the subject selection criterion feature extracted from the subject selection criterion from the generated feature selection model, and select the main subject selection criterion based on the determined importance.

상기 프로세서는 테스트 데이터를 획득하고, 상기 획득된 테스트 데이터를 이용하여 상기 특징 선택 모델의 성능을 평가하는 성능 평가부; 를 더 포함할 수 있다.The processor may include: a performance evaluation unit that acquires test data and evaluates performance of the feature selection model using the acquired test data; may further include.

상기 성능 평가부는 하기 수학식을 이용하여 상기 특징 선택 모델의 성능을 평가할 수 있다.The performance evaluation unit may evaluate the performance of the feature selection model using the following equation.

[수학식][Equation]

여기서, i는 임상시험의 인덱스이고, n(EC feature for CT_i)는 i번째 임상시험의 대상자 선정기준 특징 개수이고, n(EC feature for all CT)는 모든 임상시험의 대상자 선정기준 특징 총 개수이고, n(eligible pt with EC_original features for CT_i)는 i번째 임상시험에서 모든 대상자 선정기준 특징 조건을 만족하는 환자 수이고, n(eligible pt with EC_selected features for CT_i)는 i번째 임상시험에서 특징 선택 모델로 선택된 모든 대상자 선정기준 특징 조건을 만족하는 환자 수임where i is the index of the clinical trial, n(EC feature for CT _i ) is the number of subject selection criteria features of the i-th clinical trial, and n(EC feature for all CT) is the total number of subject selection criteria features of all clinical trials , n (eligible pt with EC _original features for CT _i ) is the number of patients who satisfy all the criteria for selection criteria in the i-th clinical trial, and n (eligible pt with EC _selected features for CT _i ) is the i-th clinical trial The number of patients who satisfy the characteristic conditions of all subjects selected as the feature selection model in

다른 양상에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 방법은, 대상 질병과 관련된 임상시험의 대상자 선정기준과 상기 대상 질병과 관련된 환자의 임상정보를 수집하는 단계; 상기 수집된 대상자 선정기준 및 상기 수집된 환자 임상정보를 기반으로 학습 데이터를 생성하는 단계; 상기 생성된 학습 데이터를 기반으로 머신러닝 또는 딥러닝 모델을 학습시켜 특징 선택 모델을 생성하는 단계; 및 상기 생성된 특징 선택 모델을 이용하여 상기 대상자 선정기준에서 추출된 대상자 선정기준 특징 중에서 주요 대상자 선정기준을 선택하는 단계; 를 포함할 수 있다.A method for selecting a main subject selection criterion for improving the effectiveness of judging the feasibility of a clinical trial according to another aspect includes: collecting the subject selection criterion for a clinical trial related to a target disease and clinical information of a patient related to the target disease; generating learning data based on the collected subject selection criteria and the collected patient clinical information; generating a feature selection model by learning a machine learning or deep learning model based on the generated training data; and selecting a main subject selection criterion from among the subject selection criterion features extracted from the subject selection criterion by using the generated feature selection model; may include

상기 학습 데이터를 생성하는 단계는, 임상시험별로 대상자 선정기준에서 계량화가 가능한 대상자 선정기준 특징을 추출하는 단계; 상기 추출된 대상자 선정기준 특징과 각 환자의 임상정보를 매핑하여 매핑 데이터를 생성하는 단계; 및 상기 추출된 대상자 선정기준 특징, 그 대상자 선정기준 특징의 조건 및 각 환자의 임상정보를 기반으로 임상시험별로 각 환자의 적격성(eligibility)을 판단하여 상기 매핑 데이터에 라벨링하는 단계; 를 포함할 수 있다.The generating of the learning data may include: extracting quantifiable subject selection criteria features from subject selection criteria for each clinical trial; generating mapping data by mapping the extracted subject selection criteria characteristics and clinical information of each patient; and labeling the mapping data by determining the eligibility of each patient for each clinical trial based on the extracted subject selection criterion characteristics, the conditions of the subject selection criterion characteristics, and clinical information of each patient; may include

상기 학습 데이터를 생성하는 단계는, 자연어 처리 기법 또는 결측치 처리 기법을 이용하여 상기 수집된 대상자 선정기준 및 상기 수집된 환자 임상정보를 전처리하는 단계; 를 더 포함할 수 있다.The generating of the learning data may include: pre-processing the collected subject selection criteria and the collected patient clinical information using a natural language processing technique or a missing value processing technique; may further include.

상기 학습 데이터를 생성하는 단계는, 스케일링 기법 또는 적층 오토인코더를 이용하여 상기 생성된 학습 데이터를 전처리하는 단계; 를 더 포함할 수 있다.The generating of the training data may include: pre-processing the generated training data using a scaling technique or a stacked autoencoder; may further include.

상기 머신러닝 모델은 라쏘 회귀 및 랜덤 포레스트를 포함하고, 상기 딥러닝 모델은 신경망을 포함할 수 있다.The machine learning model may include Lasso regression and random forest, and the deep learning model may include a neural network.

상기 특징 선택 모델을 생성하는 단계는 각 임상시험의 대상자 선정기준에서 추출된 대상자 선정기준 특징과 상기 추출된 대상자 선정기준 특징에 매핑된 환자 임상정보를 기반으로, 각 임상시험에 대한 환자의 적격성을 맞추도록 상기 머신러닝 또는 딥러닝 모델을 학습시킬 수 있다.The step of generating the feature selection model is based on the subject selection criterion features extracted from the subject selection criteria of each clinical trial and the patient clinical information mapped to the extracted subject selection criterion features, the eligibility of the patient for each clinical trial is determined. The machine learning or deep learning model can be trained to fit.

상기 주요 대상자 선정기준을 선택하는 단계는 상기 생성된 특징 선택 모델로부터 상기 대상자 선정기준에서 추출된 대상자 선정기준 특징의 중요도를 판단하고, 상기 판단된 중요도를 기반으로 상기 주요 대상자 선정기준을 선택할 수 있다.The step of selecting the main subject selection criterion may include determining the importance of the subject selection criterion feature extracted from the subject selection criterion from the generated feature selection model, and selecting the main subject selection criterion based on the determined importance. .

임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 방법은, 테스트 데이터를 획득하고, 상기 획득된 테스트 데이터를 이용하여 상기 특징 선택 모델의 성능을 평가하는 단계; 를 더 포함할 수 있다.A method for selecting a main subject selection criterion for improving the efficiency of determining clinical trial feasibility includes acquiring test data and evaluating the performance of the feature selection model using the acquired test data; may further include.

상기 특징 선택 모델의 성능을 평가하는 단계는 하기 수학식을 이용하여 상기 특징 선택 모델의 성능을 평가할 수 있다.In the evaluating the performance of the feature selection model, the performance of the feature selection model may be evaluated using the following equation.

[수학식] [Equation]

일 양상에 따르면, 머신러닝 또는 딥러닝을 기반으로 대상 질병과 관련된 임상시험의 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준을 선택할 수 있다. 따라서, 선택된 주요 대상자 선정기준을 이용하여 임상시험의 실시가능성을 판단함으로써 임상시험의 실시가능성 판단의 효율과 정확도를 향상시킬 수 있다.According to one aspect, based on machine learning or deep learning, a main subject selection criterion for improving the efficiency of determining the feasibility of a clinical trial related to a target disease may be selected. Therefore, by judging the feasibility of the clinical trial using the selected main subject selection criteria, the efficiency and accuracy of judging the feasibility of the clinical trial can be improved.

도 1은 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치를 도시한 도면이다.
도 2는 학습 데이터를 설명하기 위한 도면이다.
도 3은 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치를 도시한 도면이다.
도 4는 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 방법을 도시한 도면이다.
도 5는 예시적 실시예에 따른 2형 당뇨에 대한 특징 선택 모델로부터 판단된 대상자 선정기준 특징의 중요도를 도시한 도면이다.1 is a diagram illustrating a main subject selection criterion selection device for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment.
2 is a diagram for explaining learning data.
3 is a diagram illustrating a main subject selection criterion selection device for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment.
4 is a diagram illustrating a method for selecting a main subject selection criterion for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment.
5 is a diagram illustrating the importance of the target selection criterion characteristic determined from the characteristic selection model for type 2 diabetes according to an exemplary embodiment.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예를 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, when it is determined that a detailed description of a known function or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

한편, 각 단계들에 있어, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 수행될 수 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.On the other hand, in each step, each step may occur differently from the specified order unless the specific order is clearly stated in context. That is, each step may be performed in the same order as specified, may be performed substantially simultaneously, or may be performed in a reverse order.

후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.The terms to be described below are terms defined in consideration of functions in the present invention, which may vary depending on the intention or custom of a user or operator. Therefore, the definition should be made based on the content throughout this specification.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 다수의 표현을 포함하고, '포함하다' 또는 '가지다' 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms such as first, second, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. The singular expression includes the plural expression unless the context clearly dictates otherwise, and the term 'comprise' or 'have' refers to a feature, number, step, operation, component, part, or combination thereof described in the specification. It is to be understood that this is not intended to indicate the existence of one or more other features or numbers, steps, operations, components, parts, or combinations thereof, or to exclude in advance the possibility of addition or existence of one or more other features.

또한, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주 기능별로 구분한 것에 불과하다. 즉, 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있다. 각 구성부는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in the present specification, the classification of the constituent units is merely classified according to the main functions each constituent unit is responsible for. That is, two or more components may be combined into one component, or one component may be divided into two or more for each more subdivided function. In addition to the main functions that each of the constituent units is responsible for, each of the constituent units may additionally perform some or all of the functions of other constituent units, and some of the main functions of each constituent unit are dedicated to other constituent units. may be performed. Each component may be implemented as hardware or software, or may be implemented as a combination of hardware and software.

도 1은 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치를 도시한 도면이고, 도 2는 학습 데이터를 설명하기 위한 도면이다.1 is a diagram illustrating a main subject selection criterion selection device for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment, and FIG. 2 is a diagram for explaining learning data.

도 1을 참조하면, 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치(100)(이하, 주요 대상자 선정기준 선택 장치)는 데이터 수집부(110) 및 프로세서(120)를 포함할 수 있다.Referring to FIG. 1 , a main subject selection criterion selection device 100 (hereinafter, a main subject selection criterion selection device) for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment includes a data collection unit 110 and It may include a processor 120 .

데이터 수집부(110)는 대상 질병과 관련하여 수행한 다수의 구조화된 임상시험 대상자 선정기준과, 대상 질병과 관련된 다수의 환자 임상정보를 수집할 수 있다. 여기서 대상자 선정기준은 대상자 선정기준 특징(이하, 특징)과 그 특징의 조건을 포함할 수 있다. 예를 들어, 대상자 선정기준이 "연령이 20세 이상 40세 이하일 것"이라면, 특징은 "연령"이고, 조건은 "20세 이상 40세 이하"일 수 있다. 환자 임상정보는 임상시험의 대상자 선정기준과 관련된 각 환자의 임상정보를 포함할 수 있다.The data collection unit 110 may collect a plurality of structured clinical trial subject selection criteria performed in relation to a target disease, and a plurality of patient clinical information related to the target disease. Here, the target selection criteria may include target selection criteria characteristics (hereinafter, characteristics) and conditions for the characteristics. For example, if the target selection criterion is "the age will be 20 or more and 40 or less", the characteristic may be "age" and the condition may be "20 or more and 40 or less". Patient clinical information may include clinical information of each patient related to the criteria for selection of subjects for clinical trials.

예시적 실시예에 따르면, 데이터 수집부(110)는 임상시험 정보를 저장하는 외부 데이터베이스(예컨대, Clinicaltrials.gov 등)로부터 대상 질병과 관련하여 수행된 다수의 임상시험의 대상자 선정기준을 수집할 수 있다. 또한, 데이터 수집부(110)는 환자 임상정보를 저장하는 외부 데이터베이스(예컨대, 전자 의무 기록(Electronic Medical Record) 등)로부터 대상 질병과 관련된 다수의 환자의 임상정보를 수집할 수 있다. 이때, 데이터 수집부(110)는 유무선 통신 기술을 이용할 수 있다. 여기서 무선 통신 기술은 블루투스(bluetooth) 통신, BLE(Bluetooth Low Energy) 통신, 근거리 무선 통신(Near Field Communication, NFC), WLAN 통신, 지그비(Zigbee) 통신, 적외선(Infrared Data Association, IrDA) 통신, WFD(Wi-Fi Direct) 통신, UWB(ultra-wideband) 통신, Ant+ 통신, WIFI 통신, RFID(Radio Frequency Identification) 통신, 3G 통신, 4G 통신 및 5G 통신 등을 포함할 수 있으나 이에 한정되는 것은 아니다.According to an exemplary embodiment, the data collection unit 110 may collect subject selection criteria of a plurality of clinical trials performed in relation to a target disease from an external database (eg, Clinicaltrials.gov, etc.) storing clinical trial information. have. In addition, the data collection unit 110 may collect clinical information of a plurality of patients related to a target disease from an external database (eg, electronic medical record) that stores patient clinical information. In this case, the data collection unit 110 may use a wired/wireless communication technology. Here, the wireless communication technology is Bluetooth (bluetooth) communication, BLE (Bluetooth Low Energy) communication, near field communication (NFC), WLAN communication, Zigbee communication, infrared (Infrared Data Association, IrDA) communication, WFD (Wi-Fi Direct) communication, UWB (ultra-wideband) communication, Ant+ communication, WIFI communication, RFID (Radio Frequency Identification) communication, 3G communication, 4G communication, 5G communication, etc. may include, but are not limited thereto.

프로세서(120)는 주요 대상자 선정기준 선택 장치(100)의 전반적인 동작을 제어할 수 있다. The processor 120 may control the overall operation of the main target selection criterion selection apparatus 100 .

프로세서(120)는 소정의 이벤트가 발생하면, 데이터 수집부(110)를 제어하여 대상 질병과 관련하여 수행된 다수의 임상시험의 대상자 선정기준과, 대상 질병과 관련된 다수의 환자 임상정보를 수집할 수 있다.When a predetermined event occurs, the processor 120 controls the data collection unit 110 to collect the subject selection criteria of a plurality of clinical trials performed in relation to the target disease and a plurality of patient clinical information related to the target disease. can

프로세서(120)는 수집된 대상자 선정기준 및 환자 임상정보를 기반으로 특징 선택 모델을 생성하고, 생성된 특징 선택 모델을 이용하여 대상자 선정기준의 다수의 특징들 중에서 대상 질병과 관련된 임상시험의 실시가능성 판단에 이용될 주요 대상자 선정기준(이하, 주요 특징)을 선택할 수 있다.The processor 120 generates a feature selection model based on the collected subject selection criteria and patient clinical information, and uses the generated feature selection model to perform clinical trials related to the target disease among a plurality of features of the subject selection criteria. It is possible to select the main target selection criteria (hereinafter, main characteristics) to be used for judgment.

프로세서(120)는 학습 데이터 생성부(121), 특징 선택 모델 생성부(122) 및 특징 선택부(123)를 포함할 수 있다.The processor 120 may include a training data generator 121 , a feature selection model generator 122 , and a feature selector 123 .

학습 데이터 생성부(121)는 수집된 대상자 선정기준 및/또는 수집된 환자 임상정보를 전처리할 수 있다.The learning data generator 121 may pre-process the collected subject selection criteria and/or the collected patient clinical information.

예시적 실시예에 따르면, 학습 데이터 생성부(121)는 자연어 처리 기법 및 결측치 처리 기법 등을 이용하여 수집된 대상자 선정기준 및/또는 환자 임상정보를 전처리할 수 있다. 예를 들어, 수집된 대상자 선정기준 및/또는 환자 임상정보가 프리텍스트(free-text) 형태인 경우, 학습 데이터 생성부(121)는 자연어 처리 기법을 이용하여 수집된 대상자 선정기준 및/또는 환자 임상정보를 전처리할 수 있다. 또한, 수집된 환자 임상정보에 대상자 선정기준과 관련된 정보가 없으면, 즉 결측치가 존재하면, 결측치 처리 기법(예를 들면, 다중 대체법(multiple imputations by chained equations, MICE) 등)을 이용하여 결측치를 대체할 수 있다.According to an exemplary embodiment, the learning data generator 121 may pre-process the collected subject selection criteria and/or patient clinical information using a natural language processing technique and a missing value processing technique. For example, when the collected subject selection criteria and/or patient clinical information is in a free-text form, the learning data generation unit 121 collects the subject selection criteria and/or patient clinical information using a natural language processing technique. Clinical information can be pre-processed. In addition, if there is no information related to the subject selection criteria in the collected patient clinical information, that is, if there is a missing value, the missing value processing technique (eg, multiple imputations by chained equations, MICE, etc.) is used to calculate the missing value. can be replaced

학습 데이터 생성부(121)는 임상시험별로 대상자 선정기준에서 계량화가 가능한 특징을 추출할 수 있다. 예를 들어, 계량화가 가능한 특징은 인구학적 정보(예컨대, 연령, 체질량지수 및 성별 등) 및 검사 수치(예컨대, 당화혈색소, 크레아틴 등) 등을 포함할 수 있다.The learning data generator 121 may extract quantifiable features from the subject selection criteria for each clinical trial. For example, the quantifiable characteristic may include demographic information (eg, age, body mass index, gender, etc.) and test values (eg, glycated hemoglobin, creatine, etc.).

학습 데이터 생성부(121)는 추출된 특징과 각 환자의 임상정보를 매핑하여 매핑 데이터를 생성할 수 있다. 예를 들어, 제1 임상시험 대상자 선정기준에서 추출된 특징이 연령, 당화혈색소, 식전 혈당 및 혈중 알부민 농도라고 가정한다. 이 경우, 학습 데이터 생성부(121)는 도 2에 도시된 바와 같이, 각 환자의 임상정보 중 연령 정보, 당화혈색소 정보, 식전 혈당 정보 및 혈중 알부민 농도 정보를 각 특징에 매핑할 수 있다. 위와 같은 방법으로 모든 임상시험에 대하여 추출된 특징과 각 환자의 임상정보를 매핑할 수 있다.The learning data generator 121 may generate mapping data by mapping the extracted features and clinical information of each patient. For example, it is assumed that the characteristics extracted from the first clinical trial subject selection criteria are age, glycated hemoglobin, pre-meal blood glucose, and blood albumin concentration. In this case, as shown in FIG. 2 , the learning data generator 121 may map age information, glycated hemoglobin information, pre-meal blood glucose information, and blood albumin concentration information among clinical information of each patient to each characteristic. With the above method, the extracted features for all clinical trials and clinical information of each patient can be mapped.

학습 데이터 생성부(121)는 임상시험별로 추출된 특징, 그 특징의 조건 및 수집된 환자 임상정보를 기반으로 임상시험별로 각 환자의 적격성(eligibility)을 판단할 수 있다. 예를 들어, 학습 데이터 생성부(121)는 각 환자의 임상정보를 바탕으로 임상시험별로 추출된 특징의 조건 만족 여부를 판단하여 추출된 모든 특징의 조건을 만족하면 해당 임상시험에 대한 해당 환자의 적격성을 "적격"으로 판단하고, 그렇지 않으면 "부적격"으로 판단할 수 있다.The learning data generator 121 may determine the eligibility of each patient for each clinical trial based on features extracted for each clinical trial, conditions of the characteristics, and collected patient clinical information. For example, the learning data generator 121 determines whether the conditions of the extracted features for each clinical trial are satisfied based on the clinical information of each patient, and when the conditions of all the extracted features are satisfied, the Eligibility may be determined as "eligible", otherwise it may be determined as "disqualified".

학습 데이터 생성부(121)는 대상자 선정조건과 환자 임상정보의 매핑 데이터 및 적격성 판단 결과를 기반으로 학습 데이터를 생성할 수 있다. 예를 들어, 학습 데이터 생성부(121)는 적격성 판단 결과를 매핑 데이터에 라벨링하여 도 2와 같은 학습 데이터를 구축할 수 있다. 도 2에서 적격성 항목의 "1"은 적격을 나타내며, "0"은 부적격을 나타낼 수 있다. 후술하는 바와 같이 머신러닝(machine learning) 또는 딥러닝(deep learning)의 일종인 특징 선택 모델 학습시, 특징 및 그 특징에 매핑된 환자 임상정보는 모델 입력값(input)으로 이용되며, 적격성 판단 결과는 정답(target)으로 이용될 수 있다.The learning data generator 121 may generate learning data based on the mapping data between the subject selection condition and the patient's clinical information and the eligibility determination result. For example, the training data generator 121 may construct the training data as shown in FIG. 2 by labeling the eligibility determination result on the mapping data. In FIG. 2 , “1” of the eligibility item may indicate eligibility, and “0” may indicate disqualification. As will be described later, when learning a feature selection model, which is a type of machine learning or deep learning, features and patient clinical information mapped to the features are used as model input, and the eligibility determination result can be used as a target.

학습 데이터 생성부(121)는 생성된 학습 데이터를 전처리할 수 있다. 예를 들어, 학습 데이터 생성부(121)는 다양한 스케일링 기법(예를 들어, 최소-최대 스케일링(min-max scaling), 표준 스케일링(standard scaling) 등)을 이용하여 학습 데이터를 전처리 할 수 있다. 또한, 학습 데이터 생성부(121)는 적층 오토인코더(Stacked Autoencoder, SAE)를 이용하여 학습 데이터를 전처리할 수 있다. 적층 오토인코더는 특징 간의 상관 관계를 더욱 부각시킬 수 있는 알고리듬으로, 이를 통해 특징 정제(feature refinement) 효과를 얻을 수 있다. 또한, 학습 데이터 생성부(121)는 생성된 학습 데이터에 범주형 변수(categorical variable)가 존재하면 범주형 변수를 가변수(dummy variable)로 변환하는 전처리를 수행할 수 있다.The training data generator 121 may pre-process the generated training data. For example, the training data generator 121 may preprocess the training data using various scaling techniques (eg, min-max scaling, standard scaling, etc.). In addition, the training data generator 121 may pre-process the training data using a stacked autoencoder (SAE). The stacked autoencoder is an algorithm that can further emphasize the correlation between features, and through this, a feature refinement effect can be obtained. Also, when a categorical variable exists in the generated training data, the training data generator 121 may perform preprocessing of converting the categorical variable into a dummy variable.

특징 선택 모델 생성부(122)는 학습 데이터를 기반으로 특징 선택 모델(feature selection model)을 생성할 수 있다. 특징 선택 모델을 생성하는 알고리듬은 머신러닝(machine learning) 혹은 딥러닝(deep learning)을 기반으로 학습시킬 수 있다. 예를 들어, 머신러닝은 라쏘 회귀(lasso regression), 랜덤 포레스트(random forest) 등을 포함하고, 딥러닝은 신경망(neural network) 등을 포함할 수 있다.The feature selection model generator 122 may generate a feature selection model based on the training data. The algorithm for generating the feature selection model can be trained based on machine learning or deep learning. For example, machine learning may include lasso regression, random forest, and the like, and deep learning may include a neural network.

예를 들어, 특징 선택 모델 생성부(122)는 학습 데이터 중 임상시험별 특징과 각 특징에 매핑된 환자 임상정보를 입력으로 하고, 학습 데이터 중 임상시험별로 각 환자의 적격성을 정답(target)으로 하여 머신러닝 또는 딥러닝 모델을 학습시킬 수 있다. 즉, 특징 선택 모델 생성부(122)는 각 특징과, 각 특징에 매핑된 환자 임상정보를 기반으로 각 환자의 임상시험별 적격성을 예측하도록 머신러닝 또는 딥러닝 모델을 학습시킬 수 있다. 예시적 실시예에 따르면, 특징 선택 모델 생성부(122)는 학습 데이터를 소정의 비율에 따라 학습 세트(training set)와 검증 세트(validation set)로 구분할 수 있다. 또한, 특징 선택 모델 생성부(122)는 학습 세트로 머신러닝 또는 딥러닝 모델을 학습시킨 후 검증 세트로 학습된 머신러닝 또는 딥러닝 모델을 평가하고 평가 결과를 기반으로 머신러닝 또는 딥러닝 모델의 하이퍼파라미터(hyperparameter)를 조정할 수 있다. 이러한 하이퍼파라미터 조정 과정을 통해 최종 특징 선택 모델이 생성될 수 있다.For example, the feature selection model generation unit 122 receives the characteristics of each clinical trial among the learning data and the patient clinical information mapped to each characteristic as inputs, and sets the eligibility of each patient for each clinical trial among the learning data as the correct answer (target). It can train machine learning or deep learning models. That is, the feature selection model generator 122 may train a machine learning or deep learning model to predict each patient's eligibility for each clinical trial based on each feature and patient clinical information mapped to each feature. According to an exemplary embodiment, the feature selection model generator 122 may divide the training data into a training set and a validation set according to a predetermined ratio. In addition, the feature selection model generation unit 122 evaluates the machine learning or deep learning model trained with the validation set after learning the machine learning or deep learning model with the training set, and based on the evaluation result of the machine learning or deep learning model. Hyperparameters can be adjusted. Through this hyperparameter adjustment process, a final feature selection model may be generated.

특징 선택부(123)는 생성된 특징 선택 모델을 이용하여 대상자 선정기준에서 추출된 다수의 특징들 중에서 대상 질병과 관련된 임상시험의 실시가능성 판단에 이용될 주요 특징을 선택할 수 있다. 예를 들어, 특징 선택부(123)는 생성된 특징 선택 모델로부터 각 특징의 가중치 또는 중요도를 판단하고, 판단된 가중치의 절대값 또는 중요도가 소정의 임계값 이상인 특징을 주요 특징으로 선택할 수 있다. 여기서 가중치 또는 중요도는 각 특징이 적격성 값을 예측하는 데에 영향을 미치는 정도를 나타낼 수 있다.The feature selector 123 may select a main feature to be used in determining the feasibility of a clinical trial related to a target disease from among a plurality of features extracted from the subject selection criteria by using the generated feature selection model. For example, the feature selector 123 may determine the weight or importance of each feature from the generated feature selection model, and select a feature whose absolute value or importance of the determined weight is equal to or greater than a predetermined threshold as the main feature. Here, the weight or importance may indicate the degree to which each feature affects predicting the eligibility value.

선택된 주요 특징은 대상 질병과 관련된 임상시험의 실시가능성 판단과 관련하여, 다른 특징들보다 영향이 큰 중요한 특징일 수 있다. 따라서, 선택된 주요 특징을 이용하여 임상시험의 실시가능성을 판단함으로써 임상시험의 실시가능성 판단의 효율과 정확도를 향상시킬 수 있다.The selected main characteristic may be an important characteristic that has a greater influence than other characteristics with respect to judging the feasibility of a clinical trial related to the target disease. Therefore, by judging the feasibility of the clinical trial using the selected main characteristics, the efficiency and accuracy of judging the feasibility of the clinical trial can be improved.

예시적 실시예에 따르면, 프로세서(120)는 성능 평가부(124)를 더 포함할 수 있다.According to an exemplary embodiment, the processor 120 may further include a performance evaluation unit 124 .

성능 평가부(124)는 테스트 데이터를 획득하고, 획득된 테스트 데이터를 이용하여 특징 선택 모델의 성능을 평가할 수 있다. 여기서 테스트 데이터는 특징 선택 모델의 학습 과정에서 이용되지 않았던 데이터로서 학습 데이터와 동일한 방식으로 생성된 데이터일 수 있다.The performance evaluation unit 124 may acquire test data and evaluate the performance of the feature selection model using the acquired test data. Here, the test data is data that was not used in the learning process of the feature selection model and may be data generated in the same manner as the training data.

예를 들어, 성능 평가부(124)는 수학식 1로 표현되는 재현율(Recall) 및/또는 수학식 2로 표현되는 검출능(Detectability)을 이용하여 특징 선택 모델의 성능을 평가할 수 있다. 여기서 재현율 및 검출능은 1에 가까울수록 특징 선택 모델의 성능이 우수한 것으로 볼 수 있다.For example, the performance evaluation unit 124 may evaluate the performance of the feature selection model using the recall expressed by Equation 1 and/or the detectability expressed by Equation 2 . Here, it can be seen that the closer the recall and detectability are to 1, the better the performance of the feature selection model is.

여기서, n(TP)는 실제 적격성이"1"인 정답을 특징 선택 모델이 "1"로 예측한 개수를 나타내고, n(FN)은 실제 적격성이"1"인 정답을 특징 선택 모델이 "0"으로 예측한 개수를 나타낼 수 있다.Here, n(TP) represents the number of correct answers with an actual qualification of “1” predicted by the feature selection model as “1”, and n(FN) represents the number of correct answers with an actual qualification of “1” predicted by the feature selection model as “0” " can represent the predicted number.

여기서, i는 임상시험의 인덱스이고, n(EC feature for CT_i)는 i번째 임상시험의 특징(EC feature) 개수이고, n(EC feature for all CT)는 모든 임상시험의 EC feature 총 개수이고, n(eligible pt with EC_original features for CT_i)는 i번째 임상시험에서 모든 EC feature 조건을 만족하는 환자 수이고, n(eligible pt with EC_selected features for CT_i)는 i번째 임상시험에서 특징 선택 모델로 선택된 모든 EC feature 조건을 만족하는 환자 수를 나타낼 수 있다.where i is the index of clinical trials, n(EC feature for CT _i ) is the number of EC features of the i-th clinical trial, and n(EC feature for all CT) is the total number of EC features of all clinical trials , n(eligible pt with EC _original features for CT _i ) is the number of patients who satisfy all EC feature conditions in the ith clinical trial, and n(eligible pt with EC _selected features for CT _i ) is the number of patients who have selected features in the ith clinical trial It can represent the number of patients that satisfy all EC feature conditions selected as a model.

도 3은 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 장치를 도시한 도면이다.3 is a diagram illustrating a main subject selection criterion selection device for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment.

도 3을 참조하면, 주요 대상자 선정기준 선택 장치(300)는 데이터 수집부(110), 프로세서(120), 입력부(310), 저장부(320), 통신부(330) 및 출력부(340)를 포함할 수 있다. 여기서, 데이터 수집부(110) 및 프로세서(120)는 도 1 및 도 2를 참조하여 전술한 바와 같으므로 그 상세한 설명은 생략하기로 한다.Referring to FIG. 3 , the main target selection criterion selection device 300 includes a data collection unit 110 , a processor 120 , an input unit 310 , a storage unit 320 , a communication unit 330 , and an output unit 340 . may include Here, since the data collection unit 110 and the processor 120 are the same as those described above with reference to FIGS. 1 and 2 , a detailed description thereof will be omitted.

입력부(310)는 사용자로부터 다양한 조작신호 및 정보를 입력 받을 수 있다. 일 실시예에 따르면, 입력부(510)는 키 패드(key pad), 돔 스위치(dome switch), 터치 패드(touch pad), 조그 휠(Jog wheel), 조그 스위치(Jog switch), H/W 버튼 등을 포함할 수 있다. 특히, 터치 패드가 디스플레이와 상호 레이어 구조를 이룰 경우, 이를 터치 스크린이라 부를 수 있다.The input unit 310 may receive various manipulation signals and information from the user. According to an embodiment, the input unit 510 includes a key pad, a dome switch, a touch pad, a jog wheel, a jog switch, and a H/W button. and the like. In particular, when the touch pad forms a layer structure with the display, it may be referred to as a touch screen.

저장부(320)는 주요 대상자 선정기준 선택 장치(300)의 동작을 위한 프로그램 또는 명령들을 저장할 수 있고, 주요 대상자 선정기준 선택 장치(300)에 입력되는 데이터 및 처리된 데이터를 저장할 수 있다. 예를 들어, 저장부(320)는 수집된 대상자 선정기준, 수집된 환자 임상정보, 생성된 학습 데이터, 생성된 특징 선택 모델, 및 선택된 주요 특징 등을 저장할 수 있다. 예를 들어, 저장부(320)는 전술한 데이터들을 관계형 데이터베이스(relational database)의 기본 단위인 테이블 형태로 저장할 수 있다.The storage unit 320 may store a program or commands for the operation of the main target selection criteria selection device 300 , and may store data input to the primary target selection criteria selection device 300 and processed data. For example, the storage unit 320 may store the collected subject selection criteria, the collected clinical information of the patient, the generated learning data, the generated feature selection model, and selected main features. For example, the storage 320 may store the above-described data in the form of a table, which is a basic unit of a relational database.

저장부(320)는 플래시 메모리 타입(flash memory type), 하드 디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예컨대, SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read Only Memory), PROM(Programmable Read Only Memory), 자기 메모리, 자기 디스크, 광디스크 등 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 주요 대상자 선정기준 선택 장치(300)는 인터넷 상에서 저장부(320)의 저장 기능을 수행하는 웹 스토리지(web storage) 등 외부 저장 매체를 운영할 수도 있다. The storage unit 320 includes a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory, etc.), RAM (Random Access Memory, RAM), SRAM (Static Random Access Memory), ROM (Read Only Memory, ROM), EEPROM (Electrically Erasable Programmable Read Only Memory), PROM (Programmable Read Only Memory), magnetic memory, magnetic disk, optical disk and at least one type of storage medium. In addition, the main target selection criterion selection device 300 may operate an external storage medium such as a web storage that performs a storage function of the storage unit 320 on the Internet.

통신부(330)는 외부 장치와 통신을 수행할 수 있다. 예컨대, 통신부(330)는 주요 대상자 선정기준 선택 장치(300)에 입력된 데이터, 저장된 데이터, 처리된 데이터 등을 외부 장치로 전송하거나, 외부 장치로부터 임상시험 실시가능성 판단에 이용될 주요 특징을 선택하기 위한 다양한 데이터를 수신할 수 있다.The communication unit 330 may communicate with an external device. For example, the communication unit 330 transmits data, stored data, processed data, etc. input to the main subject selection criteria selection device 300 to an external device, or selects major features to be used in determining clinical trial feasibility from the external device It can receive various data for

이때, 외부 장치는 주요 특징 선택에 이용되는 임상시험 정보 및/또는 환자 임상정보를 저장하거나 주요 대상자 선정기준 선택 장치(300)에 입력된 데이터, 저장된 데이터, 처리된 데이터 등을 사용하는 의료 장비일 수 있다. 이외에도 외부 장치는 디지털 TV, 데스크탑 컴퓨터, 휴대폰, 스마트 폰, 태블릿, 노트북, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 네비게이션 장치, MP3 플레이어, 디지털 카메라, 웨어러블 디바이스 등 일 수 있으나, 이에 제한되지 않는다.In this case, the external device is a medical device that stores clinical trial information and/or patient clinical information used for selecting main features or uses data input to the main subject selection criterion selection device 300, stored data, processed data, etc. can In addition, the external device may be a digital TV, desktop computer, mobile phone, smart phone, tablet, notebook, PDA (Personal Digital Assistants), PMP (Portable Multimedia Player), navigation device, MP3 player, digital camera, wearable device, etc. not limited

통신부(330)는 유무선 통신 기술을 이용하여 외부 장치와 통신할 수 있다. 이때 무선 통신 기술은 블루투스(bluetooth) 통신, BLE(Bluetooth Low Energy) 통신, 근거리 무선 통신(Near Field Communication, NFC), WLAN 통신, 지그비(Zigbee) 통신, 적외선(Infrared Data Association, IrDA) 통신, WFD(Wi-Fi Direct) 통신, UWB(ultra-wideband) 통신, Ant+ 통신, WIFI 통신, RFID(Radio Frequency Identification) 통신, 3G 통신, 4G 통신 및 5G 통신 등을 포함할 수 있으나 이는 일 예에 불과할 뿐이며, 이에 한정되는 것은 아니다.The communication unit 330 may communicate with an external device using a wired/wireless communication technology. At this time, the wireless communication technology is Bluetooth (bluetooth) communication, BLE (Bluetooth Low Energy) communication, near field communication (NFC), WLAN communication, Zigbee communication, infrared (Infrared Data Association, IrDA) communication, WFD (Wi-Fi Direct) communication, UWB (ultra-wideband) communication, Ant+ communication, WIFI communication, RFID (Radio Frequency Identification) communication, 3G communication, 4G communication, 5G communication, etc. may include, but this is only an example, and , but is not limited thereto.

출력부(340)는 주요 대상자 선정기준 선택 장치(300)에 입력된 데이터, 저장된 데이터, 처리된 데이터 등을 출력할 수 있다. 일 실시예에 따르면, 출력부(340)는 수집된 대상자 선정기준, 수집된 환자 임상정보, 생성된 학습 데이터, 생성된 특징 선택 모델, 및 선택된 주요 특징 등을 청각적 방법, 시각적 방법 및 촉각적 방법 중 적어도 하나의 방법으로 출력할 수 있다. 이를 위해 출력부(340)는 디스플레이, 스피커, 진동기 등을 포함할 수 있다.The output unit 340 may output data input to the main subject selection criterion selection device 300 , stored data, processed data, and the like. According to an embodiment, the output unit 340 displays the collected subject selection criteria, the collected clinical information of the patient, the generated learning data, the generated feature selection model, and the selected main features in an auditory method, a visual method and a tactile method. The output may be performed by at least one of the methods. To this end, the output unit 340 may include a display, a speaker, a vibrator, and the like.

도 4는 예시적 실시예에 따른 임상시험 실시가능성 판단의 효율성을 제고하기 위한 주요 대상자 선정기준 선택 방법을 도시한 도면이다. 도 4의 방법은 도 1 또는 도 3의 주요 대상자 선정기준 선택 장치(100, 300)에 의해 수행될 수 있다.4 is a diagram illustrating a method for selecting a main subject selection criterion for improving the efficiency of determining clinical trial feasibility according to an exemplary embodiment. The method of FIG. 4 may be performed by the main target selection criteria selection apparatuses 100 and 300 of FIG. 1 or FIG. 3 .

도 4를 참조하면, 주요 대상자 선정기준 선택 장치는 대상 질병과 관련하여 수행된 다수의 임상시험의 대상자 선정기준과, 대상 질병과 관련된 다수의 환자의 임상정보를 수집할 수 있다(410). 여기서 대상자 선정기준은 특징과 그 특징의 조건을 포함하고, 환자 임상정보는 임상시험 대상자 선정기준과 관련된 각 환자의 임상정보를 포함할 수 있다.Referring to FIG. 4 , the main subject selection criteria selection device may collect subject selection criteria of a plurality of clinical trials performed in relation to a target disease and clinical information of a plurality of patients related to the target disease ( 410 ). Here, the subject selection criteria may include characteristics and conditions of the characteristics, and the patient clinical information may include clinical information of each patient related to the clinical trial subject selection criteria.

예를 들어, 주요 대상자 선정기준 선택 장치는 임상시험 정보를 저장하는 외부 데이터베이스 및 환자 임상정보를 저장하는 외부 데이터베이스로부터 대상 질병과 관련하여 수행된 다수의 임상시험 대상자 선정기준과, 대상 질병과 관련된 다수의 환자 임상정보를 수집할 수 있다.For example, the main subject selection criterion selection device includes a plurality of clinical trial subject selection criteria performed in relation to a target disease from an external database storing clinical trial information and an external database storing patient clinical information, and a plurality of subjects related to the target disease. of patient clinical information can be collected.

주요 대상자 선정기준 선택 장치는 수집된 대상자 선정기준 및 수집된 환자 임상정보를 기반으로 학습 데이터를 생성할 수 있다(420).The main subject selection criterion selection device may generate learning data based on the collected subject selection criteria and the collected patient clinical information ( 420 ).

예를 들어, 주요 대상자 선정기준 선택 장치는 임상시험별로 대상자 선정기준에서 계량화가 가능한 특징을 추출하고, 추출된 특징과 각 환자의 임상정보를 매핑하여 매핑 데이터를 생성하고, 추출된 특징, 그 특징의 조건 및 각 환자의 임상정보를 기반으로 임상시험별로 각 환자의 적격성(eligibility)을 판단하여 매핑 데이터에 라벨링함으로써, 학습 데이터를 생성할 수 있다.For example, the main subject selection criterion selection device extracts quantifiable features from the subject selection criteria for each clinical trial, maps the extracted features and each patient's clinical information to generate mapping data, and extracts the extracted features and their characteristics. Learning data can be generated by judging the eligibility of each patient for each clinical trial based on the conditions and clinical information of each patient and labeling the mapping data.

이때, 주요 대상자 선정기준 선택 장치는 자연어 처리(natural language processing, NLP) 기법 및 결측치 처리 기법 등을 이용하여 수집된 대상자 선정기준 및/또는 환자 임상정보를 전처리하거나, 다양한 스케일링 기법(예를 들어, 최소-최대 스케일링(min-max scaling), 표준 스케일링(standard scaling) 등) 및 적층 오토인코더 등을 이용하여 학습 데이터를 전처리할 수 있다. 또한, 주요 대상자 선정기준 선택 장치는 학습 데이터에 범주형 변수(categorical variable)가 존재하면 범주형 변수를 가변수(dummy variable)로 변환하는 전처리를 수행할 수 있다.At this time, the main subject selection criteria selection device pre-processes the subject selection criteria and/or patient clinical information collected using natural language processing (NLP) techniques and missing value processing techniques, or various scaling techniques (eg, The training data can be preprocessed using min-max scaling, standard scaling, etc.) and stacked autoencoder. In addition, if a categorical variable exists in the learning data, the main target selection criterion selection device may perform preprocessing of converting the categorical variable into a dummy variable.

주요 대상자 선정기준 선택 장치는 생성된 학습 데이터를 기반으로 특징 선택 모델을 생성할 수 있다(430). 특징 선택 모델을 생성하는 알고리듬은 머신러닝 혹은 딥러닝을 기반으로 학습시킬 수 있다. 예를 들어, 머신러닝은 라쏘 회귀, 랜덤 포레스트 등을 포함하고, 딥러닝은 신경망 등을 포함할 수 있다. The main target selection criterion selection device may generate a feature selection model based on the generated learning data ( 430 ). The algorithm for generating the feature selection model can be trained based on machine learning or deep learning. For example, machine learning may include lasso regression, random forest, and the like, and deep learning may include neural networks and the like.

예를 들어, 주요 대상자 선정기준 선택 장치는 학습 데이터 중 임상시험별 특징과 각 특징에 매핑된 환자 임상정보를 입력값(input)으로 하고, 학습 데이터 중 임상시험별 각 환자의 적격성을 정답(target)으로 하여 머신러닝 또는 딥러닝 모델을 학습시킬 수 있다. 즉, 주요 대상자 선정기준 선택 장치는 각 임상시험의 특징과, 각 특징에 매핑된 환자 임상정보를 기반으로 각 환자의 적격성을 예측하도록 머신러닝 또는 딥러닝 모델을 학습시켜 특징 선택 모델을 생성할 수 있다.For example, the main subject selection criterion selection device uses the characteristics of each clinical trial among the learning data and the patient clinical information mapped to each characteristic as input values, and sets the eligibility of each patient for each clinical trial among the learning data as the correct answer (target). ) to train machine learning or deep learning models. In other words, the main subject selection criteria selection device can generate a feature selection model by training a machine learning or deep learning model to predict the eligibility of each patient based on the characteristics of each clinical trial and patient clinical information mapped to each characteristic. have.

주요 대상자 선정기준 선택 장치는 생성된 특징 선택 모델을 이용하여 대상자 선정기준에서 추출된 다수의 특징들 중에서 대상 질병과 관련된 임상시험의 실시가능성 판단에 이용될 주요 대상자 선정기준을 선택할 수 있다(440). 예를 들어, 주요 대상자 선정기준 선택 장치는 특징 선택 모델로부터 각 특징의 가중치 또는 중요도를 판단하고, 판단된 가중치의 절대값 또는 중요도가 소정의 임계값 이상인 특징을 주요 대상자 선정기준으로 선택할 수 있다. 여기서 가중치 또는 중요도는 각 특징이 적격성 값을 예측하는 데에 영향을 미치는 정도를 나타낼 수 있다.The main subject selection criterion selection device may select a main subject selection criterion to be used in determining the feasibility of a clinical trial related to a target disease from among a plurality of features extracted from the subject selection criteria using the generated feature selection model (440) . For example, the main subject selection criterion selection device may determine the weight or importance of each feature from the feature selection model, and select a feature whose absolute value or importance of the determined weight is greater than or equal to a predetermined threshold as the main subject selection criterion. Here, the weight or importance may indicate the degree to which each feature affects predicting the eligibility value.

주요 대상자 선정기준 선택 장치는 테스트 데이터를 획득하고, 획득된 테스트 데이터를 이용하여 특징 선택 모델의 성능을 평가할 수 있다(450). 여기서 테스트 데이터는 특징 선택 모델의 학습 과정에서 이용되지 않았던 데이터로서 학습 데이터와 동일한 방식으로 생성된 데이터일 수 있다.The apparatus for selecting the main target selection criteria may acquire test data and evaluate the performance of the feature selection model using the acquired test data ( S450 ). Here, the test data is data that was not used in the learning process of the feature selection model and may be data generated in the same manner as the training data.

예를 들어, 주요 대상자 선정기준 선택 장치는 수학식 1로 표현되는 재현율(Recall) 및/또는 수학식 2로 표현되는 검출능(Detectability)을 이용하여 특징 선택 모델의 성능을 평가할 수 있다. 여기서 재현율 및 검출능은 1에 가까울수록 특징 선택 모델의 성능이 우수한 것으로 볼 수 있다.For example, the apparatus for selecting the main subject selection criteria may evaluate the performance of the feature selection model using the recall expressed by Equation 1 and/or the detectability expressed by Equation 2 . Here, it can be seen that the closer the recall and detectability are to 1, the better the performance of the feature selection model is.

[실시예][Example]

Clinicaltrials.gov에서 2형 당뇨와 관련된 300건의 임상시험의 대상자 선정기준을 수집하고, 서울대학교병원의 임상데이터웨어하우스(clinical data warehouse, CDW)로부터 최근 5년간 2형 당뇨로 진단받은 19,610명의 환자 임상정보를 수집하였다.Clinicaltrials.gov collected the criteria for selecting subjects for 300 clinical trials related to type 2 diabetes and conducted clinical trials of 19,610 patients diagnosed with type 2 diabetes in the past 5 years from the clinical data warehouse (CDW) of Seoul National University Hospital. Information was collected.

수집된 EC에 제한적인 자연어 처리 기법을 적용하여 free-text 형태의 EC를 구조화하여 분야 전문가(의료인)가 검토하였다. 그 후 300건의 임상시험 EC 중에서 계량화가 가능한 총 2744건(중복 제거시 34건)의 특징(EC feature)을 추출하였다.By applying a limited natural language processing technique to the collected ECs, free-text ECs were structured and reviewed by field experts (medical personnel). After that, among 300 clinical trial ECs, a total of 2744 quantifiable EC features (34 cases when duplicates were removed) were extracted.

수집된 환자 임상정보의 결측치 문제는 다중대체법(multiple imputations by chained equations, MICE)를 이용하여 처리하고, 추출된 EC feature와 각 환자의 임상정보를 매핑하였다. 임상시험별로 추출된 EC feature와 수집된 환자 임상정보를 기반으로 임상시험별로 각 환자의 적격성(eligibility)을 판단하여 적격성이 있으면 '1', 적격성이 없으면 '0'으로 매핑 데이터에 레이블하여 총 5,883,000개의 데이터 세트를 구축하였다.The problem of missing values in the collected patient clinical information was processed using multiple imputations by chained equations (MICE), and the extracted EC features were mapped to each patient's clinical information. Based on the EC features extracted from each clinical trial and the collected patient clinical information, the eligibility of each patient is judged for each clinical trial. data sets were constructed.

총 5,883,000개의 데이터 세트를 최소-최대 스케일러(min-max scaler)로 스케일링한 후 랜덤하게 분할하여 2,941,500개의 데이터 세트를 training set로, 1,764,900개의 데이터 세트를 validation set로, 1,176,600개의 데이터 세트를 test set로 분리하였다.A total of 5,883,000 data sets are scaled with a min-max scaler and then randomly divided, 2,941,500 data sets as training set, 1,764,900 data sets as validation set, and 1,176,600 data sets as test set. separated.

training set을 이용하여 다층 feed-forward 방식의 인공 신경망을 back-propagation을 이용하여 stochastic gradient descent을 기반으로 학습시키고, validation set으로 학습된 인공 신경망을 평가하여 평가 결과를 기반으로 인공 신경망의 하이퍼파라미터를 조정하여 최종 특징 선택 모델을 생성하였다.Using the training set, a multi-layer feed-forward artificial neural network is trained based on stochastic gradient descent using back-propagation, and the artificial neural network trained with the validation set is evaluated and hyperparameters of the artificial neural network are calculated based on the evaluation results. Adjustments were made to generate the final feature selection model.

생성된 특징 선택 모델로부터 적격성 값을 예측하는 데 기여한 EC feature의 중요도를 확인하였다.From the generated feature selection model, the importance of the EC feature that contributed to predicting the eligibility value was confirmed.

그 결과 도 5의 특징 중요도를 획득할 수 있었다.As a result, the feature importance of FIG. 5 was obtained.

총 34개의 EC features 중 특징 중요도가 소정의 임계값 이상인 16개의 EC features를 선택한 후, validation set를 이용하여 특징 선택 모델의 성능을 평가하였다. 성능 평가는 수학식 1로 표현되는 재현율(Recall)과 수학식 2로 표현되는 검출능(Detectability)을 이용하였다.After selecting 16 EC features whose feature importance is above a predetermined threshold among a total of 34 EC features, the performance of the feature selection model was evaluated using the validation set. For performance evaluation, recall expressed by Equation 1 and detectability expressed by Equation 2 were used.

성능 평가 결과, 재현율은 0.85이고, 검출능은 0.87888으로 나타났으며, 해당 모델은 총 34개의 EC features 중 18개의 EC features를 제거하였음에도 불구하고 만족할 만한 성능을 보임을 알 수 있었다.As a result of the performance evaluation, the recall was 0.85 and the detection ability was 0.87888, and it was found that the model showed satisfactory performance despite the removal of 18 EC features out of a total of 34 EC features.

상술한 실시예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함할 수 있다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 디스크 등을 포함할 수 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드로 작성되고 실행될 수 있다.The above-described embodiments may be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium may include any type of recording device in which data readable by a computer system is stored. Examples of the computer-readable recording medium may include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, and the like. In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and may be written and executed as computer-readable code in a distributed manner.

100, 300: 주요 대상자 선정기준 선택 장치
110: 데이터 수집부
120: 프로세서
121: 학습 데이터 생성부
122: 특징 선택 모델 생성부
123: 특징 선택부
124: 성능 평가부
310: 입력부
320: 저장부
330: 통신부
340: 출력부100, 300: Selection device for the selection criteria for major subjects
110: data collection unit
120: processor
121: training data generation unit
122: feature selection model generation unit
123: feature selection unit
124: performance evaluation unit
310: input unit
320: storage
330: communication unit
340: output unit

Claims

A data collection unit for collecting subject selection criteria (eligibility criteria) of a clinical trial related to a target disease and clinical information of a patient related to the target disease; and
A feature selection model is generated based on the collected subject selection criteria and the collected patient clinical information, and main subject selection criteria to be used for determining the feasibility of clinical trials related to the target disease are determined using the generated characteristic selection model. processor of your choice; containing,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

According to claim 1,
The processor is
a learning data generation unit for generating learning data based on the collected subject selection criteria and the collected clinical information of the patient;
a feature selection model generator for generating a feature selection model by learning a machine learning or deep learning model based on the generated training data; and
a feature selection unit for selecting the main subject selection criterion from among the subject selection criterion features extracted from the subject selection criterion using the generated feature selection model; containing,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

3. The method of claim 2,
The learning data generation unit extracts quantifiable subject selection criteria characteristics from the subject selection criteria for each clinical trial, maps the extracted subject selection criteria characteristics and clinical information of each patient to generate mapping data, and selects the extracted subjects Generating the learning data by determining the eligibility (eligibility) of each patient for each clinical trial and labeling the mapping data based on the criteria characteristics, the conditions of the subject selection criteria characteristics, and the clinical information of each patient,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

4. The method of claim 3,
The learning data generation unit pre-processes the collected subject selection criteria and the collected patient clinical information using a natural language processing technique or a missing value processing technique,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

4. The method of claim 3,
The training data generation unit pre-processes the generated training data using a scaling technique or a stacked autoencoder,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

3. The method of claim 2,
The machine learning model includes lasso regression and random forest,
The deep learning model includes a neural network,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

3. The method of claim 2,
The feature selection model generation unit calculates the patient's eligibility for each clinical trial based on the subject selection criterion characteristics extracted from the subject selection criteria of each clinical trial and the patient clinical information mapped to the extracted subject selection criterion characteristics training the machine learning or deep learning model to make predictions,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

3. The method of claim 2,
The feature selection unit determines the importance of the subject selection criterion feature extracted from the subject selection criterion from the generated feature selection model, and selects the main subject selection criterion based on the determined importance,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

3. The method of claim 2,
The processor may include: a performance evaluation unit that acquires test data and evaluates performance of the feature selection model using the acquired test data; further comprising,
A device for selecting major subjects to improve the effectiveness of clinical trial feasibility determination.

10. The method of claim 9,
The performance evaluation unit evaluates the performance of the feature selection model using the following equation,
A device for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.
[Equation]

where i is the index of the clinical trial, n(EC feature for CT _i ) is the number of subject selection criteria features of the i-th clinical trial, and n(EC feature for all CT) is the total number of subject selection criteria features of all clinical trials , n (eligible pt with EC _original features for CT _i ) is the number of patients who satisfy all the criteria for selection criteria in the i-th clinical trial, and n (eligible pt with EC _selected features for CT _i ) is the i-th clinical trial The number of patients who satisfy the characteristic conditions of all subjects selected as the feature selection model in

Collecting subject selection criteria for a clinical trial related to a target disease and clinical information of a patient related to the target disease;
generating learning data based on the collected subject selection criteria and the collected patient clinical information;
generating a feature selection model by learning a machine learning or deep learning model based on the generated training data; and
selecting a main subject selection criterion from among the subject selection criterion features extracted from the subject selection criterion using the generated feature selection model; containing,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

12. The method of claim 11,
The step of generating the learning data is,
extracting quantifiable characteristics of the subject selection criteria from the subject selection criteria for each clinical trial;
generating mapping data by mapping the extracted subject selection criteria characteristics and clinical information of each patient; and
determining the eligibility of each patient for each clinical trial based on the extracted subject selection criterion characteristics, the conditions of the subject selection criterion characteristics, and clinical information of each patient, and labeling the mapping data; containing,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

13. The method of claim 12,
The step of generating the learning data is,
pre-processing the collected subject selection criteria and the collected patient clinical information using a natural language processing technique or a missing value processing technique; further comprising,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

13. The method of claim 12,
The step of generating the learning data is,
pre-processing the generated training data using a scaling technique or a stacked autoencoder; further comprising,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

12. The method of claim 11,
The machine learning model includes lasso regression and random forest,
The deep learning model includes a neural network,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

12. The method of claim 11,
The step of generating the feature selection model is based on the patient's eligibility ( To train the machine learning or deep learning model to meet eligibility),
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

12. The method of claim 11,
The step of selecting the main subject selection criterion is to determine the importance of the subject selection criterion feature extracted from the subject selection criterion from the generated feature selection model, and to select the main subject selection criterion based on the determined importance,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

12. The method of claim 11,
acquiring test data and evaluating the performance of the feature selection model using the acquired test data; further comprising,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.

19. The method of claim 18,
Evaluating the performance of the feature selection model is to evaluate the performance of the feature selection model using the following equation,
A method for selecting major subjects to improve the effectiveness of judging the feasibility of clinical trials.
[Equation]