KR102327062B1

KR102327062B1 - Apparatus and method for predicting result of clinical trial

Info

Publication number: KR102327062B1
Application number: KR1020180032281A
Authority: KR
Inventors: 프레드릭 기유; 김경훈; 오봉근; 유형균; 김기동
Original assignee: 딜로이트컨설팅유한회사
Priority date: 2018-03-20
Filing date: 2018-03-20
Publication date: 2021-11-17
Also published as: WO2019182297A1; KR20190110381A

Abstract

본 발명은 기계학습 알고리즘을 이용하여 새로운 치료법에 대한 임상시험의 성공확률을 예측하는 임상시험 결과 예측 장치 및 방법에 관한 것으로, 임상시험 관련정보를 입력하는 입력부, 및 임상시험 사례 데이터를 가공하여 학습 데이터를 생성하고, 상기 학습 데이터를 이용하여 기계학습을 수행하여 임상시험 결과를 예측하는 예측 모델을 생성하고, 상기 생성된 예측 모델을 이용하여 상기 임상시험 관련정보에 따른 임상시험 결과를 예측하는 처리부를 포함한다.The present invention relates to a clinical trial result prediction apparatus and method for predicting the success probability of a clinical trial for a new treatment using a machine learning algorithm, an input unit for inputting clinical trial-related information, and learning by processing clinical trial case data A processing unit that generates data, generates a predictive model for predicting clinical trial results by performing machine learning using the learning data, and predicts clinical trial results according to the clinical trial-related information using the generated predictive model includes

Description

Apparatus and method for predicting clinical trial results {APPARATUS AND METHOD FOR PREDICTING RESULT OF CLINICAL TRIAL}

본 발명은 기계학습 알고리즘을 이용하여 새로운 치료법에 대한 임상시험의 성공확률을 예측하는 임상시험 결과 예측 장치 및 방법에 관한 것이다.The present invention relates to a clinical trial result prediction apparatus and method for predicting the success probability of a clinical trial for a new treatment using a machine learning algorithm.

임상시험(Clinical Trial)은 약물, 새로운 시술 방법 및 의료기기 등의 새로운 치료법에 대한 안전성과 유효성을 검증하기 위해 사람을 대상으로 행하는 시험으로, 보통 3단계로 진행된다. 제1상 임상시험에서는 소수의 건강한 사람을 대상으로 안전성과 내약성이 검토되고, 제2상 임상시험에서는 소수의 환자를 대상으로 약물의 적정 용량과 용법, 안전성 및 유효성을 탐색하며, 제3상 임상시험에서는 다수의 환자를 대상으로 안전성과 유효성이 검토된다.Clinical trials are tests conducted on humans to verify the safety and effectiveness of new treatments such as drugs, new surgical methods, and medical devices, and are usually conducted in three stages. In the phase 1 clinical trial, safety and tolerability are reviewed in a small number of healthy people, and in the phase 2 clinical trial, the appropriate dose, use, safety and efficacy of the drug are explored in a small number of patients, and in the phase 3 clinical trial In the trial, safety and efficacy are reviewed in a large number of patients.

이러한 임상시험은 환자에게 어떠한 부작용 및 위험을 초래할지를 예측하기 어렵고, 수 년에 걸쳐 진행되며 많은 비용이 들어간다. 이에, 임상시험에 소요되는 시간 및 비용을 줄이기 위한 노력들이 계속 되고 있다.These clinical trials are difficult to predict what side effects and risks they will cause to patients, take years and are expensive. Accordingly, efforts are being made to reduce the time and cost required for clinical trials.

KR 1020070106027 AKR 1020070106027 A KR 1020130112024 AKR 1020130112024 A

본 발명은 기계학습 알고리즘을 이용하여 새로운 치료법에 대한 임상시험의 성공확률을 예측하는 임상시험 결과 예측 장치 및 방법을 제공하고자 한다.An object of the present invention is to provide a clinical trial result prediction apparatus and method for predicting the success probability of a clinical trial for a new treatment using a machine learning algorithm.

상기한 과제를 해결하기 위하여, 본 발명의 일 실시 예에 따른 임상시험 결과 예측 장치는 임상시험 관련정보를 입력하는 입력부, 및 임상시험 사례 데이터를 가공하여 학습 데이터를 생성하고, 상기 학습 데이터를 이용하여 기계학습을 수행하여 임상시험 결과를 예측하는 예측 모델을 생성하고, 상기 생성된 예측 모델을 이용하여 상기 임상시험 관련정보에 따른 임상시험 결과를 예측하는 처리부를 포함한다.In order to solve the above problems, the clinical trial result prediction apparatus according to an embodiment of the present invention generates learning data by processing an input unit for inputting clinical trial-related information, and clinical trial case data, and uses the learning data to generate a predictive model for predicting clinical trial results by performing machine learning, and includes a processing unit for predicting clinical trial results according to the clinical trial-related information by using the generated predictive model.

상기 처리부는, 상기 학습 데이터를 통해 임상시험별 성공률을 판단하기 위한 기계학습을 수행하여 상기 예측 모델을 생성하는 학습 모듈, 및 상기 예측 모델을 이용하여 임상시험 성공률을 예측하는 예측 모듈을 포함한다.The processing unit includes a learning module for generating the predictive model by performing machine learning for determining the success rate for each clinical trial based on the learning data, and a prediction module for predicting the clinical trial success rate by using the predictive model.

상기 학습 모듈은, 다수의 제1학습 알고리즘들과 하나의 제2학습 알고리즘을 이용하여 상기 기계학습을 수행하는 것을 특징으로 한다.The learning module is characterized in that the machine learning is performed using a plurality of first learning algorithms and one second learning algorithm.

상기 학습 모듈은, 상기 다수의 제1학습 알고리즘들 각각이 상기 학습 데이터로부터 추출된 1차 학습 단계용 데이터세트의 임상시험 조건과 임상시험 결과 간의 관계를 학습하게 하는 1차 학습 단계를 수행하는 것을 특징으로 한다.The learning module performs a primary learning step in which each of the plurality of first learning algorithms learns the relationship between the clinical trial conditions and the clinical trial results of the dataset for the primary learning stage extracted from the learning data. characterized.

상기 학습 모듈은, 1차 학습된 상기 다수의 제1학습 알고리즘들이 상기 학습 데이터로부터 추출된 2차 학습 단계용의 임상시험 조건을 통해 예측한 결과들과 상기 2차 학습 단계용 데이터세트의 임상시험 결과를 고려하여 상기 다수의 제1학습 알고리즘들 중 가장 예측력이 좋은 알고리즘을 판별하는 상기 제2학습 알고리즘을 학습하게 하는 2차 학습 단계를 수행하는 것을 특징으로 한다.The learning module includes the results predicted through the clinical trial conditions for the secondary learning stage extracted from the learning data by the plurality of primary learning algorithms and the clinical trial of the dataset for the secondary learning stage. A secondary learning step of learning the second learning algorithm for determining the algorithm having the best predictive power among the plurality of first learning algorithms in consideration of the result is performed.

상기 학습 모듈은, 상기 1차 학습 단계 및 상기 2차 학습 단계를 거친 상기 다수의 제1학습 알고리즘들과 상기 제2학습 알고리즘이 상기 학습 데이터로부터 추출한 테스트용 데이터세트를 통해 예측한 임상시험 결과와 실제 임상시험 결과에 근거하여 상기 예측 모델의 성능 지수를 산출하고 산출된 상기 예측 모델의 성능 지수에 따라 상기 다수의 제1학습 알고리즘들과 상기 제2학습 알고리즘의 파라미터를 최적화하는 테스트 및 최적화 단계를 수행하는 것을 특징으로 한다.The learning module includes a clinical test result predicted through a test dataset extracted from the learning data by the plurality of first learning algorithms and the second learning algorithm that have undergone the primary learning step and the secondary learning step; A test and optimization step of calculating the performance index of the predictive model based on the actual clinical test result and optimizing the parameters of the plurality of first learning algorithms and the second learning algorithm according to the calculated performance index of the predictive model characterized by performing.

상기 학습 모듈은, 상기 예측 모델의 성능 지수가 목표 성능 지수에 도달할 때까지 상기 1차 학습 단계, 상기 2차 학습 단계 및 상기 테스트 및 최적화 단계를 반복적으로 수행하는 것을 특징으로 한다.The learning module is characterized by repeatedly performing the first learning step, the secondary learning step, and the testing and optimization steps until the figure of merit of the predictive model reaches a target figure of merit.

상기 학습 모듈은, 상기 예측 모델의 성능 지수로 상기 예측 모델의 예측 정확도 및 예측 정밀도를 산출하는 것을 특징으로 한다.The learning module is characterized in that the prediction accuracy and the prediction precision of the prediction model are calculated by the performance index of the prediction model.

상기 다수의 제1학습 알고리즘들은, K-근접이웃(K-Nearest Neighbor) 알고리즘, 그래디언트 부스팅(Gradient Boosting Machine) 알고리즘, 신경망(Neural Network) 알고리즘, 랜덤 포레스트(Random Forest) 알고리즘, 엑스트라 트리(extra trees), 및 로지스틱회귀(logistic regression) 알고리즘을 포함한다.The plurality of first learning algorithms, K-Nearest Neighbor Algorithm, Gradient Boosting Machine Algorithm, Neural Network Algorithm, Random Forest Algorithm, Extra Trees ), and logistic regression algorithms.

상기 제2학습 알고리즘은, 로지스틱회귀 알고리즘으로 구현되는 것을 특징으로 한다.The second learning algorithm is characterized in that it is implemented as a logistic regression algorithm.

한편, 본 발명의 일 실시 예에 따른 임상시험 결과 예측 방법은 임상시험 사례 데이터를 통해 기계학습을 수행하여 임상시험 결과를 예측하는 예측 모델을 생성하는 단계, 상기 예측 모델을 생성한 후, 사용자 단말로부터 임상시험 관련정보를 수신하는 단계, 및 상기 예측 모델을 이용하여 상기 임상시험 관련정보에 따른 임상시험 결과를 예측하는 단계를 포함한다.On the other hand, the clinical trial result prediction method according to an embodiment of the present invention includes the steps of performing machine learning through clinical trial case data to generate a predictive model for predicting the clinical trial result, after generating the predictive model, the user terminal Receiving clinical trial-related information from, and predicting a clinical trial result according to the clinical trial-related information by using the predictive model.

상기 예측 모델을 생성하는 단계는, 다수의 제1학습 알고리즘들 각각이 상기 임상시험 사례 데이터로부터 추출된 1차 학습 단계용 데이터세트 내 임상시험 조건과 임상시험 결과 간의 관계를 학습하게 하는 1차 학습 단계, 1차 학습된 상기 다수의 제1학습 알고리즘들이 상기 임상시험 사례 데이터로부터 추출된 2차 학습 단계용의 임상시험 조건을 통해 예측한 결과들과 상기 2차 학습 단계용 데이터세트의 임상시험 결과를 고려하여 상기 다수의 제1학습 알고리즘들 중 가장 예측력이 좋은 알고리즘을 판별하는 제2학습 알고리즘을 학습하게 하는 2차 학습 단계, 및 상기 1차 학습 단계 및 상기 2차 학습 단계를 거친 상기 다수의 제1학습 알고리즘들과 상기 제2학습 알고리즘이 상기 임상시험 사례 데이터로부터 추출한 테스트용 데이터세트를 통해 예측한 임상시험 결과와 실제 임상시험 결과에 근거하여 상기 예측 모델의 성능 지수를 산출하고 산출된 상기 예측 모델의 성능 지수에 따라 상기 다수의 제1학습 알고리즘들과 상기 제2학습 알고리즘의 파라미터를 최적화하는 테스트 및 최적화 단계를 포함한다.The generating of the predictive model includes primary learning in which each of a plurality of first learning algorithms learns a relationship between clinical trial conditions and clinical trial results in a dataset for the primary learning stage extracted from the clinical trial case data. Step, the results predicted through the clinical trial conditions for the secondary learning stage extracted from the clinical trial case data by the plurality of primary learning algorithms for the primary learning and the clinical trial results of the dataset for the secondary learning stage A secondary learning step of learning a second learning algorithm for determining an algorithm with the best predictive power among the plurality of first learning algorithms in consideration of The first learning algorithms and the second learning algorithm calculate the performance index of the predictive model based on the clinical trial results predicted through the test dataset extracted from the clinical trial case data and the actual clinical trial results, and the calculated and a testing and optimizing step of optimizing parameters of the plurality of first learning algorithms and the second learning algorithm according to the performance index of the predictive model.

상기 임상시험 결과 예측 방법은 상기 예측 모델의 성능 지수가 목표 성능 지수에 도달할 때까지 상기 1차 학습 단계, 상기 2차 학습 단계 및 상기 테스트 및 최적화 단계를 반복적으로 수행하는 것을 특징으로 한다.The clinical trial result prediction method is characterized in that the primary learning step, the secondary learning step, and the testing and optimization steps are repeatedly performed until the performance index of the predictive model reaches a target performance index.

상기 테스트 및 최적화 단계에서, 상기 예측 모델의 성능 지수로 상기 예측 모델의 예측 정확도 및 예측 정밀도를 산출하는 것을 특징으로 한다.In the testing and optimization step, the prediction accuracy and prediction precision of the prediction model are calculated using the performance index of the prediction model.

상기 예측 정확도는, 전체 임상시험 사례 중 상기 예측 모델에 의해 정확하게 예측된 임상시험 사례의 비율인 것을 특징으로 한다.The prediction accuracy is characterized in that the proportion of clinical trial cases accurately predicted by the predictive model among all clinical trial cases.

상기 예측 정밀도는, 상기 예측 모델에 의해 성공으로 예측된 전체 임상시험 사례 중 정확하게 예측된 임상시험 사례의 비율인 것을 특징으로 한다.The prediction precision is characterized in that it is a proportion of the clinical trial cases correctly predicted among all the clinical trial cases predicted to be successful by the predictive model.

본 발명은 기계학습 알고리즘을 이용하여 새로운 치료법에 대한 임상시험의 성공확률을 예측할 수 있어 새로운 치료법에 대한 임상시험 기간을 단축할 수 있으며 임상시험에 들어가는 비용을 줄일 수 있다.The present invention can predict the success probability of a clinical trial for a new treatment using a machine learning algorithm, thereby shortening the clinical trial period for a new treatment and reducing the cost of the clinical trial.

도 1은 본 발명의 일 실시 예에 따른 임상시험 결과 예측을 서비스하는 시스템을 도시한 구성도.
도 2는 본 발명의 일 실시 예에 따른 임상시험 결과 예측 장치의 블록구성도.
도 3은 본 발명의 일 실시 예에 따른 기계학습 과정을 도시한 흐름도.
도 4는 도 3의 1차 학습 단계를 설명하기 위한 도면.
도 5는 도 3의 2차 학습 단계를 설명하기 위한 도면.
도 6은 도 3의 테스트 및 최적화 단계를 설명하기 위한 도면.
도 7은 본 발명과 관련된 예측 모델의 예측 정밀도를 설명하기 위한 도면.
도 8은 본 발명의 일 실시 예에 따른 임상시험 결과 예측 방법을 도시한 흐름도.
도 9a 내지 9c는 도 8에 도시된 각 단계 화면을 도시한 도면.
도 10은 본 발명의 일 실시 예에 따른 임상시험 결과 예측 방법을 실행하는 컴퓨팅 시스템을 보여주는 블록도.1 is a block diagram illustrating a system for providing a clinical trial result prediction service according to an embodiment of the present invention.
2 is a block diagram of an apparatus for predicting clinical trial results according to an embodiment of the present invention.
3 is a flowchart illustrating a machine learning process according to an embodiment of the present invention.
FIG. 4 is a view for explaining the primary learning step of FIG. 3 .
5 is a view for explaining the secondary learning step of FIG.
FIG. 6 is a view for explaining the testing and optimization steps of FIG. 3 ;
7 is a diagram for explaining prediction accuracy of a prediction model related to the present invention.
8 is a flowchart illustrating a method for predicting clinical trial results according to an embodiment of the present invention.
9A to 9C are views showing screens of each step shown in FIG. 8;
10 is a block diagram illustrating a computing system for executing a clinical trial result prediction method according to an embodiment of the present invention.

이하, 본 발명의 일부 실시 예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명의 실시 예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 실시 예에 대한 이해를 방해한다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the embodiment of the present invention, if it is determined that a detailed description of a related known configuration or function interferes with the understanding of the embodiment of the present invention, the detailed description thereof will be omitted.

본 발명의 실시 예의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 또한, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In describing the components of the embodiment of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the components from other components, and the essence, order, or order of the components are not limited by the terms. In addition, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

도 1은 본 발명의 일 실시 예에 따른 임상시험 결과 예측을 서비스하는 시스템을 도시한 구성도이다.1 is a block diagram illustrating a system for providing a clinical trial result prediction service according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시 예에 따른 임상시험 결과 예측을 서비스하는 시스템은 네트워크를 통해 연결되는 임상시험 결과 예측 장치(100) 및 사용자 단말(200)을 포함한다. 여기서, 네트워크는 유무선 인터넷 네트워크로, LAN(Local Area Network), WAN(Wide Area Network), 이더넷(ethernet), WLAN(Wireless LAN)(WiFi), Wibro(Wireless broadband), Wimax(World Interoperability for Microwave Access) 및 HSDPA(High Speed Downlink Packet Access) 등을 포함한다.As shown in FIG. 1 , a system for providing a clinical trial result prediction service according to an embodiment of the present invention includes a clinical trial result prediction apparatus 100 and a user terminal 200 connected through a network. Here, the network is a wired/wireless Internet network, such as a local area network (LAN), a wide area network (WAN), ethernet, a wireless LAN (WLAN) (WiFi), a wireless broadband (Wibro), and a world interoperability for microwave access (Wimax). ) and High Speed Downlink Packet Access (HSDPA), and the like.

임상시험 결과 예측 장치(이하, 예측 장치)(100)는 웹 서비스를 제공하는 웹 서버로서의 역할을 수행한다. 예측 장치(100)는 사용자의 요청에 따라 로그인 절차를 수행한다. 다시 말해서, 예측 장치(100)는 사용자가 사용자 단말(200)을 통해 자신의 식별정보인 아이디(ID)와 비밀번호(password)를 입력하면 사용자 단말(200)로부터 입력된 아이디 및 비밀번호를 전달받아 데이터베이스(DB)에 등록된 사용자인지를 확인하여 웹 서비스의 사용 권한을 승인하거나 거부한다.The clinical trial result prediction device (hereinafter, the prediction device) 100 serves as a web server that provides a web service. The prediction device 100 performs a login procedure according to a user's request. In other words, the prediction device 100 receives the ID and password input from the user terminal 200 when the user inputs his/her identification information ID and password through the user terminal 200 to the database Confirm whether the user is registered in (DB) and approve or deny the right to use the web service.

예측 장치(100)는 사용자 단말(200)로부터 임상시험 수행과 관련한 정보(임상시험 관련정보)를 입력 받으면 기계학습이 완료된 예측 모델을 이용하여 임상시험의 성공률을 예측한다. 예측 장치(100)는 네트워크를 통해 예측 결과를 사용자 단말(200)로 전송한다.The prediction device 100 predicts the success rate of the clinical trial by using the predictive model on which machine learning has been completed upon receiving information related to clinical trial performance (clinical trial-related information) from the user terminal 200 . The prediction apparatus 100 transmits a prediction result to the user terminal 200 through a network.

사용자 단말(200)은 웹 브라우저를 통해 예측 장치(100)가 제공하는 웹 서비스를 이용한다. 사용자 단말(200)은 사용자가 입력수단을 통해 입력하는 임상시험 관련정보(임상시험 조건)를 예측 장치(100)로 전송한다. 또한, 사용자 단말(200)은 예측 장치(100)로부터 전송되는 예측 결과를 수신하여 출력 수단을 통해 출력한다.The user terminal 200 uses a web service provided by the prediction device 100 through a web browser. The user terminal 200 transmits clinical trial related information (clinical test conditions) input by the user through the input means to the prediction device 100 . In addition, the user terminal 200 receives the prediction result transmitted from the prediction apparatus 100 and outputs it through an output means.

이러한 사용자 단말(200)은 노트북 컴퓨터(200-1), 이동통신단말(200-2) 및 데스크탑 컴퓨터(200-3) 등으로 구현될 수 있다. 사용자 단말(200)은 하나 이상의 프로세서, 메모리 및 통신모듈 등을 포함할 수 있다.The user terminal 200 may be implemented as a notebook computer 200 - 1 , a mobile communication terminal 200 - 2 , and a desktop computer 200 - 3 . The user terminal 200 may include one or more processors, memories, and communication modules.

도 2는 본 발명의 일 실시 예에 따른 임상시험 결과 예측 장치의 블록구성도를 도시한다.2 is a block diagram of an apparatus for predicting clinical trial results according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 임상시험 결과 예측 장치(100)는 통신부(110), 입력부(120), 저장부(130), 출력부(140) 및 처리부(150)를 포함한다.As shown in FIG. 2 , the clinical trial result prediction apparatus 100 includes a communication unit 110 , an input unit 120 , a storage unit 130 , an output unit 140 , and a processing unit 150 .

통신부(110)는 사용자 단말(200)과 데이터 통신을 수행한다. 통신부(110)는 LAN, WAN, 이더넷, WiFi, Wibro, Wimax 및 HSDPA 등의 네트워크를 통해 사용자 단말(200)과 데이터를 주고 받는다.The communication unit 110 performs data communication with the user terminal 200 . The communication unit 110 exchanges data with the user terminal 200 through networks such as LAN, WAN, Ethernet, WiFi, Wibro, Wimax, and HSDPA.

통신부(110)는 사용자 단말(200)로부터 전송되는 사용자 정보(아이디 및 비밀번호 등) 및/또는 임상시험 관련정보(임상시험 조건 또는 임상시험 특징)를 수신한다. 통신부(110)는 처리부(150)의 제어에 따라 임상시험 성공률 예측 결과를 사용자 단말(200)로 전송한다.The communication unit 110 receives user information (such as an ID and password) and/or clinical trial related information (clinical trial conditions or clinical trial characteristics) transmitted from the user terminal 200 . The communication unit 110 transmits the clinical trial success rate prediction result to the user terminal 200 under the control of the processing unit 150 .

또한, 통신부(110)는 임상시험 사례 데이터(trial instances)를 수신할 수도 있다. 임상시험 사례 데이터는 미국 식품의약국(Food and Drug Administration, FDA)으로부터 제공받을 수 있다. 예를 들어, 예측 장치(100)는 통신부(110)를 통해 FDA의 데이터베이스에 접속하여 임상시험 사례 데이터를 추출(검색)할 수 있다.In addition, the communication unit 110 may receive clinical trial data (trial instances). Clinical trial case data may be obtained from the US Food and Drug Administration (FDA). For example, the prediction device 100 may access the FDA database through the communication unit 110 to extract (search) clinical trial case data.

통신부(110)는 수신한 데이터를 처리부(150)로 직접 전송하거나 또는 입력부(120)를 통해 처리부(150)에 전송할 수 있다.The communication unit 110 may directly transmit the received data to the processing unit 150 or may transmit the received data to the processing unit 150 through the input unit 120 .

입력부(120)는 통신부(110)를 통해 수신한 데이터를 가공하여 처리부(150)로 전송할 수 있다. 즉, 입력부(120)는 사용자 정보 및/또는 임상시험 관련정보를 처리부(150)가 처리할 수 있는 데이터 형태로 전처리(pre-processing)하여 처리부(150)로 전송한다.The input unit 120 may process the data received through the communication unit 110 and transmit it to the processing unit 150 . That is, the input unit 120 pre-processes the user information and/or clinical trial related information in a data format that the processing unit 150 can process and transmits it to the processing unit 150 .

또한, 입력부(120)는 임상시험 사례 데이터를 가공하여 학습 데이터로 처리부(150)에 입력한다. 입력부(120)는 FDA의 데이터베이스로부터 추출한 데이터세트(임상시험 사례 데이터)를 전처리한다. 예컨대, 입력부(120)는 추출한 데이터세트 내 "NY" 또는 "new york" 등으로 표기된 도시명을 "New York"으로 정정하고, 기간, 도시수, 국가 수 등의 새로운 특징 및 대상 질환을 추가할 수 있다.In addition, the input unit 120 processes the clinical trial case data and inputs it to the processing unit 150 as learning data. The input unit 120 pre-processes a dataset (clinical test case data) extracted from the FDA's database. For example, the input unit 120 corrects the city name marked with "NY" or "new york" in the extracted dataset to "New York", and adds new features such as period, number of cities, number of countries, and target disease. have.

임상시험 사례 데이터는 실제 임상시험 사례의 임상시험 조건 및 임상시험 결과를 포함한다. 여기서, 임상시험 조건은 [표 1]에 나열된 특징들(features) 중 적어도 하나 이상을 포함한다.Clinical trial case data includes clinical trial conditions and clinical trial results of actual clinical trial cases. Here, the clinical trial conditions include at least one or more of the features listed in [Table 1].

특징characteristic 설명Explanation 단계(Phase)Phase 약물 임상시험 단계, 4단계로 구분됨
- Phase 1: 동물에서 일정수준의 안전성이 검증된 물질이 인체에 소량 투여/복용되었을 시, 어떠한 생리학적 작용이 일어나는지 관찰하는 단계로 안정성 검증이 핵심 목적임
- Phase 2: 체내 안전성이 검증된 물질이 의도한 효과를 나타내는지 검증하는 단계
- Phase 3: Phase 1과 2에서 검증된 안정성과 효능이 통계학적으로 유의미한 수의 피실험자에게서 유사한 수준으로 확인되는지 검증하는 단계
- Phase 4: 판매 허가를 받은 약물이 장기적으로 환자들에게 어떠한 영향(부작용과 unknown effect 등)을 끼치는지 연구하는 단계 Drug clinical trial phase, divided into 4 phases
- Phase 1: When a substance with a certain level of safety in animals is administered/ingested in a small amount to the human body, it is the stage to observe what kind of physiological action occurs.
- Phase 2: Verification of whether the substance whose safety has been verified in the body shows the intended effect
- Phase 3: Verifying that the safety and efficacy verified in Phases 1 and 2 are confirmed at a similar level in a statistically significant number of subjects
- Phase 4: Studying the long-term effects of a drug approved for marketing (side effects and unknown effects, etc.) 적응증(Indication)Indication 개별 약물의 대상 질환을 의미함
예: 대장암 및 천식 등Refers to the target disease of an individual drug
Examples: colorectal cancer and asthma 대상 질환(therapy area)treatment area 적응증의 상위 개념
예: 종양(Oncology) 및 호흡기질환(Respiratory) 등Higher Concepts of Indications
Examples: Oncology and Respiratory diseases, etc. 피실험자 성별(Gender of Participants)Gender of Participants 임상시험에 참여한 피실험자의 성별Gender of subjects who participated in the clinical trial 피실험자 건강(Healthiness of Participants)Health of Participants 약물이 목표로 하는 질환을 앓고 있는 피실험자 대상인지 건강한 피실험자 대상인지를 특정하기 위함To specify whether the drug is a subject suffering from a target disease or a healthy subject 피실험자 수(Number of Participants)Number of Participants 임상시험에 참여한 피실험자의 수를 의미Refers to the number of subjects who participated in the clinical trial 스폰서(Sponsor)Sponsor 임상시험의 펀딩(funding) 또는 약물을 제공하는 주체를 의미함
예: 제약사, 국가기관(보건복지부) 및 대학 등Refers to the entity that provides funding for clinical trials or drugs
Example: Pharmaceutical companies, national institutions (Ministry of Health and Welfare), universities, etc. 연구 유형(Study type)Study type 임상시험 연구유형으로, 사례 대조 연구, 코호트(cohort) 연구, 현황 연구 및 실험 연구로 구분Clinical trial study type, divided into case control study, cohort study, status study and experimental study 기간(Duration)Duration 임상시험 진행 기간Clinical trial period 소재지(Geographical Location)Geographical Location 임상시험이 진행된 병원의 소재 지역
예: 국가, 주(state) 및 도시The location of the hospital where the clinical trial was conducted
Example: country, state, and city 분자 유형(Molecule Type)Molecule Type 약물의 제제적 특성(materialistic feature)을 의미하며, 생물학적제제(Biologic)과 화학약물/화합물(Chemical drug)으로 분류It refers to the materialistic features of drugs, and is classified into biologic and chemical drugs. 작용기전(Mechanism of Action)Mechanism of Action 약물의 이론적 작용기전(mechanism)으로 체내에서 어떠한 요소와 어떠한 방식으로 반응/작용하여 기대 효과를 나타내는가에 대한 구분
예: 종양학에서는 신생혈관억제 및 PD-1 면역항암제 등의 작용기전이 있음Classification of the expected effect by reacting/acting with which elements and in what way in the body as a theoretical mechanism of action
Example: In oncology, there are mechanisms of action such as angiogenesis inhibition and PD-1 immunotherapy. 작용기전 대상(Target of Action)Target of Action 작용기전이 체내에서 구현되기 위해 약물이 직접적으로 반응하는 체내의 요소를 의미함
예: 혈관내피성장인자(VEGFr) 및 대식세포(Macrophage) 등이 있음Mechanism of action refers to an element in the body that a drug directly reacts to to be realized in the body.
Examples: Vascular endothelial growth factor (VEGFr) and macrophages (Macrophage), etc. 약물 용법(Route of Administration)Route of Administration 약물의 복용/투여 방법을 의미함Refers to how the drug is taken/administered 식약청 지정(Designation) 여부Designation by the Food and Drug Administration 공공의 니즈가 높은 약물에 대해 허가 프로세스의 일부 완화/축소, R&D 비용지원, 세제혜택 등이 제공되는 경우가 있음. 이에 대한 식별자For drugs with high public need, partial relaxation/reduction of the approval process, support for R&D expenses, and tax benefits are sometimes provided. identifier for it

입력부(120)는 임상시험 사례 데이터를 기계학습이 가능한 형태로 가공하여 출력한다. 예를 들어, 입력부(120)는 임상시험 사례 데이터를 여러 개의 독립변수와 한 개의 종속변수(상태변수)로 구성된 테이블(table) 형태로 처리부(150)에 전송한다.The input unit 120 processes and outputs clinical trial case data in a form capable of machine learning. For example, the input unit 120 transmits the clinical trial case data to the processing unit 150 in the form of a table consisting of several independent variables and one dependent variable (state variable).

또한, 입력부(120)는 사용자의 조작에 따라 입력 데이터를 발생시킨다. 입력부(120)는 키보드(keyboard), 키패드(keypad), 터치 패드(touch pad), 터치스크린(touch screen), 마우스(mouse), 바코드 판독기(bar code reader), QR(Quick Response) 코드 스캐너(code scanner), 및 조이스틱(joystick) 등으로 구성될 수 있다.Also, the input unit 120 generates input data according to a user's manipulation. Input unit 120 is a keyboard (keyboard), keypad (keypad), touch pad (touch pad), touch screen (touch screen), mouse (mouse), barcode reader (bar code reader), QR (Quick Response) code scanner ( code scanner), and a joystick.

저장부(130)는 처리부(150)의 동작을 위한 프로그램을 저장할 수 있고, 처리부(150)의 입/출력 데이터들을 임시 저장할 수도 있다. 또한, 저장부(130)는 사용자 정보를 포함하고 있는 사용자 DB를 저장할 수 있다. The storage unit 130 may store a program for the operation of the processing unit 150 , and may temporarily store input/output data of the processing unit 150 . Also, the storage unit 130 may store a user DB including user information.

저장부(130)는 기계학습 알고리즘(machine learning algorithms), 예측 모델, 학습 데이터 및 임상시험 관련정보(임상시험 특징들) 등을 저장한다. 또한, 저장부(130)는 기계학습 알고리즘을 이용한 학습 과정에서 발생되는 데이터 및 예측 모델에 의해 예측된 결과값 등을 저장할 수 있다.The storage unit 130 stores machine learning algorithms, predictive models, learning data, and clinical trial-related information (clinical trial characteristics). In addition, the storage unit 130 may store data generated in a learning process using a machine learning algorithm, a result value predicted by a prediction model, and the like.

저장부(130)는 처리부(150)의 내부 및/또는 외부에 설치될 수 있다. 저장부(130)는 플래시 메모리(flash memory), 하드디스크(hard disk), SD 카드(Secure Digital Card), 램(Random Access Memory, RAM), 롬(Read Only Memory, ROM), PROM(programmable ROM), EPROM(Erasable and Programmable ROM), EEPROM(Electrically Erasable and Programmable ROM), 레지스터, 착탈형 디스크 및 웹 스토리지(web storage) 등의 저장매체 중 적어도 하나 이상의 저장매체(기록매체)로 구현될 수 있다.The storage unit 130 may be installed inside and/or outside the processing unit 150 . The storage unit 130 includes a flash memory, a hard disk, a secure digital card (SD), a random access memory (RAM), a read only memory (ROM), and a programmable ROM (PROM). ), EPROM (Erasable and Programmable ROM), EEPROM (Electrically Erasable and Programmable ROM), registers, a removable disk, and at least one of storage media such as web storage (web storage) It may be implemented as a storage medium (recording medium).

출력부(140)는 시각 정보, 청각 정보 및/또는 촉각 정보 등의 정보를 출력하기 위한 것으로, 디스플레이, 음향 출력 모듈 및 햅틱 모듈 등이 포함될 수 있다.The output unit 140 is for outputting information such as visual information, auditory information, and/or tactile information, and may include a display, a sound output module, and a haptic module.

디스플레이는 예측 장치(100)에서 처리되는 정보를 출력한다. 예컨대, 디스플레이는 임상시험 결과 예측 모델을 훈련(training)하는 경우 이와 관련한 UI(User Interface) 또는 GUI(Graphic User Interface)를 표시한다. 디스플레이는 액정 디스플레이(liquid crystal display, LCD), 박막 트랜지스터 액정 디스플레이(thin film transistor-liquid crystal display, TFT LCD), 유기 발광 다이오드(organic light-emitting diode, OLED) 디스플레이, 플렉시블 디스플레이(flexible display), 3차원 디스플레이(3D display), 투명디스플레이, 헤드업 디스플레이(head-up display, HUD), 및 터치스크린 중에서 하나 이상을 포함할 수 있다.The display outputs information processed by the prediction apparatus 100 . For example, the display displays a UI (User Interface) or GUI (Graphic User Interface) related thereto when training a clinical trial result prediction model. The display includes a liquid crystal display (LCD), a thin film transistor-liquid crystal display (TFT LCD), an organic light-emitting diode (OLED) display, a flexible display, It may include one or more of a three-dimensional display (3D display), a transparent display, a head-up display (HUD), and a touch screen.

음향 출력 모듈은 저장부(130)에 저장된 오디오 데이터를 출력하는 스피커(speaker)로 구현될 수 있다. 햅틱 모듈은 사용자가 촉각으로 인지할 수 있는 형태의 신호를 출력한다. 예를 들어, 햅틱 모듈은 진동자로 구현되어 진동 세기 및 패턴 등을 제어할 수 있다.The sound output module may be implemented as a speaker that outputs audio data stored in the storage unit 130 . The haptic module outputs a signal in a form that can be recognized by a user's tactile sense. For example, the haptic module may be implemented as a vibrator to control the intensity and pattern of vibration.

처리부(150)는 예측 장치(100)의 전반적인 동작을 제어한다. 처리부(150)는 ASIC(Application Specific Integrated Circuit), DSP(Digital Signal Processor), PLD(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), CPU(Central Processing unit), 마이크로 컨트롤러(microcontrollers) 및 마이크로 프로세서(microprocessors) 중 적어도 하나 이상을 포함할 수 있다.The processing unit 150 controls the overall operation of the prediction apparatus 100 . The processing unit 150 includes an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Programmable Logic Devices (PLD), Field Programmable Gate Arrays (FPGAs), a Central Processing Unit (CPU), a microcontroller, and a microprocessor. (microprocessors) may include at least one or more.

처리부(150)는 저장부(130)에 저장된 웹 서버 프로그램을 실행시켜 웹 서버 기능을 수행할 수 있다. 처리부(150)는 통신부(110)를 통해 사용자 정보를 수신하면 기등록된 사용자인지를 확인하여 임상시험 결과 예측 서비스의 사용 권한을 승인하거나 거부한다.The processing unit 150 may execute a web server program stored in the storage unit 130 to perform a web server function. When the user information is received through the communication unit 110 , the processing unit 150 confirms whether the user is a pre-registered user, and approves or rejects the right to use the clinical trial result prediction service.

처리부(150)는 임상시험 사례 데이터를 이용하여 기계학습을 수행하는 학습 모듈(151) 및 기계학습된 예측 모델(prediction model)을 이용하여 임상시험 성공률을 예측하는 예측 모듈(152)를 포함한다. 여기서, 예측 모델은 다수의 기계학습 알고리즘을 이용하여 임상시험의 성공률을 예측한다.The processing unit 150 includes a learning module 151 for performing machine learning using clinical trial case data and a prediction module 152 for predicting a clinical trial success rate using a machine-learned prediction model. Here, the predictive model predicts the success rate of clinical trials using a number of machine learning algorithms.

학습 모듈(151)은 1차 학습 단계(training, level 1), 2차 학습 단계(meta-traing, level 2)와 테스트 및 최적화 단계(testing and optimizing)로 이루어지는 3단계 학습과정을 수행한다. 학습 모듈(151)은 입력부(120)를 통해 학습 데이터(dataset)를 입력 받으면 각 학습 단계를 위한 데이터세트(dataset)로 분류한다. 예를 들어, 학습 모듈(151)은 15000개의 임상시험 사례 데이터가 입력되면 랜덤 샘플링(random sampling)을 통해 1차 학습 단계용 데이터세트, 2차 학습 단계용 데이터세트 및 테스트 및 최적화 단계용 데이터세트로 각각 11000개, 2000개 및 2000개로 분류한다. 이때, 데이터세트들의 임상시험 결과(trial results)의 확률분포가 서로 유사한 형상을 가질 수 있도록 한다.The learning module 151 performs a three-step learning process consisting of a first learning step (training, level 1), a second learning step (meta-training, level 2), and a testing and optimizing step. When the learning module 151 receives learning data through the input unit 120 , it classifies it into a dataset for each learning step. For example, the learning module 151 uses random sampling when 15000 clinical trial case data is input, a dataset for the primary learning stage, a dataset for the secondary learning stage, and a dataset for testing and optimization stages. 11000, 2000 and 2000 respectively. In this case, the probability distribution of the trial results of the datasets is made to have a shape similar to each other.

학습 모듈(151)은 1차 학습 단계에서 다수의 제1학습 알고리즘들이 제1학습 단계용 데이터세트를 통해 임상시험 조건(features, Xs)과 임상시험 결과(Y)의 관계를 학습하게 한다. 제1학습 알고리즘은 K-근접이웃(K-Nearest Neighbor, KNN) 알고리즘, 그래디언트 부스팅(Gradient Boosting Machine, GBM) 알고리즘, 신경망(Neural Network) 알고리즘, 랜덤 포레스트(Random Forest) 알고리즘, 엑스트라 트리(extra trees), 및 로지스틱회귀(logistic regression) 알고리즘을 포함한다. 이러한 알고리즘들은 동일한 임상시험에 대해 서로 다른 예측력을 가진다.The learning module 151 allows a plurality of first learning algorithms to learn the relationship between clinical trial conditions (features, Xs) and clinical trial results (Y) through a dataset for the first learning stage in the primary learning stage. The first learning algorithm is a K-Nearest Neighbor (KNN) algorithm, a Gradient Boosting Machine (GBM) algorithm, a Neural Network algorithm, a Random Forest algorithm, and an extra tree. ), and logistic regression algorithms. These algorithms have different predictive powers for the same clinical trial.

학습 모듈(151)은 2차 학습 단계에서 다수의 제1학습 알고리즘들 중 가장 예측력이 좋은 알고리즘을 판단하기 위한 기계학습을 수행한다. 학습 모듈(151)은 제2학습 단계용 데이터세트를 통해 제2학습 알고리즘을 훈련시킨다. 제2학습 알로리즘은 로지스틱회귀 알고리즘으로 구현될 수 있다.The learning module 151 performs machine learning to determine the algorithm with the best predictive power among the plurality of first learning algorithms in the secondary learning step. The learning module 151 trains the second learning algorithm through the dataset for the second learning step. The second learning algorithm may be implemented as a logistic regression algorithm.

학습 모듈(151)은 1차 학습 및 2차 학습이 완료되면 테스트 및 최적화 단계를 수행한다. 이때, 학습 모듈(151)은 테스트 및 최적화 단계용 데이터세트를 통해 학습된(훈련된) 제1학습 알고리즘 및 제2학습 알고리즘을 테스트한다. 그리고, 학습 모듈(151)은 테스트 결과를 토대로 각 알고리즘의 파라미터를 최적화한다.The learning module 151 performs testing and optimization when primary learning and secondary learning are completed. At this time, the learning module 151 tests the first learning algorithm and the second learning algorithm learned (trained) through the dataset for the test and optimization step. Then, the learning module 151 optimizes the parameters of each algorithm based on the test results.

학습 모듈(151)은 학습된 제1학습 알고리즘 및 제2학습 알고리즘에 대한 테스트 및 최적화가 완료되면 예측 모델을 생성한다. 학습 모듈(151)은 생성된 예측 모델을 저장부(130)에 저장한다. 학습 모듈(151)은 주기적으로 기계학습을 통해 예측 모델을 업데이트할 수 있다.The learning module 151 generates a predictive model when testing and optimization of the learned first and second learning algorithms are completed. The learning module 151 stores the generated prediction model in the storage unit 130 . The learning module 151 may periodically update the predictive model through machine learning.

예측 모듈(152)은 통신부(110)를 통해 사용자 단말(200)로부터 전송되는 임상시험 관련정보를 수신한다. 이때, 입력부(120)는 통신부(110)를 통해 수신한 임상시험 관련정보를 가공하여 예측 모듈(152)에 제공할 수 있다.The prediction module 152 receives clinical trial related information transmitted from the user terminal 200 through the communication unit 110 . In this case, the input unit 120 may process the clinical trial related information received through the communication unit 110 and provide it to the prediction module 152 .

예측 모듈(152)은 저장부(130)에 저장된 예측 모델을 이용하여 수신한 임상시험 관련정보를 토대로 해당 임상시험의 성공률을 예측한다. 예측 모듈(152)은 예측 모델을 이용하여 예측한 결과(예측 결과)를 해당 임상시험의 성공률 예측을 요청한 사용자 단말(200)로 전송한다. 사용자 단말(200)은 예측 모듈(152)로부터 제공받은 임상시험의 성공률 예측 결과를 디스플레이에 표시한다.The prediction module 152 predicts the success rate of the clinical trial based on the received clinical trial-related information using the predictive model stored in the storage unit 130 . The prediction module 152 transmits the prediction result (prediction result) using the prediction model to the user terminal 200 requesting prediction of the success rate of the clinical trial. The user terminal 200 displays on the display the success rate prediction result of the clinical trial provided from the prediction module 152 .

도 3은 본 발명의 일 실시 예에 따른 기계학습 과정을 도시한 흐름도이고, 도 4는 도 3의 1차 학습 단계를 설명하기 위한 도면이며, 도 5는 도 3의 2차 학습 단계를 설명하기 위한 도면이고, 도 6은 도 3의 테스트 및 최적화 단계를 설명하기 위한 도면이며, 도 7은 본 발명과 관련된 예측 모델의 예측 정밀도를 설명하기 위한 도면이다.3 is a flowchart illustrating a machine learning process according to an embodiment of the present invention, FIG. 4 is a diagram for explaining the primary learning step of FIG. 3, and FIG. 5 is a diagram for explaining the secondary learning step of FIG. FIG. 6 is a diagram for explaining the test and optimization steps of FIG. 3 , and FIG. 7 is a diagram for explaining prediction precision of a prediction model related to the present invention.

먼저, 임상시험 결과 예측 장치(100)의 학습 모듈(151)은 1차 학습을 수행한다(S110). 도 4에 도시된 바와 같이, 학습 모듈(151)은 1차 학습용 데이터세트(DS1)를 다수의 제1학습 알고리즘들(AL1)의 입력 데이터로 제공한다. 1차 학습용 데이터세트(DS1)는 실제로 수행한 임상시험별 실험조건 및 실험결과를 포함한다. 제1학습 알고리즘들(AL1-1 내지 AL1-6) 각각은 1차 학습용 데이터세트(DS1)의 임상시험별 실험조건과 실험결과 간의 관계를 학습한다. 이러한 1차 학습을 통해 제1학습 알고리즘들(AL1-1 내지 AL1-6)의 파라미터(들)이 결정된다.First, the learning module 151 of the clinical trial result prediction apparatus 100 performs primary learning (S110). As shown in FIG. 4 , the learning module 151 provides the primary learning dataset DS1 as input data of the plurality of first learning algorithms AL1 . The primary learning dataset (DS1) includes experimental conditions and experimental results for each clinical trial actually performed. Each of the first learning algorithms AL1-1 to AL1-6 learns the relationship between the experimental conditions for each clinical trial and the experimental results of the primary learning dataset DS1. Parameter(s) of the first learning algorithms AL1-1 to AL1-6 are determined through this primary learning.

학습 모듈(151)은 제1학습 알고리즘들(AL1)의 훈련(학습)이 완료되면 2차 학습을 수행한다(S120). 도 5에 도시된 바와 같이, 학습 모듈(151)은 2차 학습용 데이터세트(DS2)의 임상시험 조건(features)을 1차 학습된 다수의 제1학습 알고리즘들(AL1)의 입력으로 제공한다. 제1학습 알고리즘 각각(AL1-1 내지 AL1-6)은 2차 학습용 데이터세트(실제 실험결과 불포함)를 토대로 임상시험 결과를 예측하고 예측된 결과(P1 내지 P6)를 출력한다. 제2학습 알고리즘(AL2)은 다수의 제1학습 알고리즘들(AL1)로부터 출력되는 예측 결과들(P1 내지 P6)과 2차 학습용 데이터세트의 실제 임상시험 결과를 토대로 예측력이 가장 좋은 알고리즘을 판별하기 위한 학습을 수행한다.The learning module 151 performs secondary learning when training (learning) of the first learning algorithms AL1 is completed (S120). As shown in FIG. 5 , the learning module 151 provides clinical trial conditions (features) of the secondary learning dataset DS2 as input to a plurality of primary learned first learning algorithms AL1. Each of the first learning algorithms (AL1-1 to AL1-6) predicts the clinical trial results based on the secondary learning dataset (not including the actual experimental results) and outputs the predicted results (P1 to P6). The second learning algorithm AL2 determines the algorithm with the best predictive power based on the prediction results P1 to P6 output from the plurality of first learning algorithms AL1 and the actual clinical test results of the secondary learning dataset. carry out learning for

학습 모듈(151)은 2차 학습까지 완료되면 학습된 다수의 제1학습 알고리즘들(AL1)과 제2학습 알고리즘(AL2)을 테스트하고 테스트 결과에 근거하여 학습된 다수의 제1학습 알고리즘들(AL1)과 제2학습 알고리즘(AL2)의 파라미터를 최적화한다(S130). 도 6에 도시된 바와 같이, 학습 모듈(151)은 테스트 데이터세트(DS3)의 임상시험 조건을 2단계 학습된 다수의 제1학습 알고리즘들(AL1)의 입력으로 제공한다. 각 제1학습 알고리즘(AL1-1 내지 AL1-6)은 테스트 데이터세트(DS3)의 임상시험 조건을 토대로 임상시험 결과를 예측하여 결과(P1' 내지 P6')를 출력한다. 제2학습 알고리즘(AL2)은 제1학습 알고리즘(AL1)의 출력(P1' 내지 P6')을 토대로 예측력이 가장 좋은 알고리즘의 예측 결과를 출력한다. 학습 모듈(151)은 제2학습 알고리즘(AL2)로부터 출력되는 예측 결과와 테스트 데이터세트(DS3)의 실제 임상시험 결과를 토대로 학습된 예측 모델(다수의 제1학습 알고리즘들 및 제2학습 알고리즘의 조합)의 성능 지수(performance index)를 산출한다(S131). 학습 모듈(151)은 성능 지수로 테스트 데이터세트(DS3) 전체에 대한 예측 정확도(general accuracy) 및 테스트 데이트세트(DS3) 중 임상시험 성공 케이스(achieved case)에 대한 예측 정밀도(prediction precision)를 산출한다.The learning module 151 tests the learned plurality of first learning algorithms AL1 and the second learning algorithm AL2 when secondary learning is completed, and the plurality of first learning algorithms learned based on the test result ( AL1) and the parameters of the second learning algorithm AL2 are optimized (S130). As shown in FIG. 6 , the learning module 151 provides the clinical trial conditions of the test dataset DS3 as inputs to the plurality of first learning algorithms AL1 learned in two steps. Each of the first learning algorithms AL1-1 to AL1-6 predicts the clinical trial results based on the clinical trial conditions of the test dataset DS3 and outputs the results P1' to P6'. The second learning algorithm AL2 outputs the prediction result of the algorithm having the best predictive power based on the outputs P1' to P6' of the first learning algorithm AL1. The learning module 151 is a predictive model (a plurality of first learning algorithms and a second learning algorithm) learned based on the prediction results output from the second learning algorithm AL2 and the actual clinical trial results of the test dataset DS3 combination) to calculate a performance index (S131). The learning module 151 calculates prediction accuracy for the entire test dataset DS3 as a performance index and prediction precision for a clinical trial success case among the test dataset DS3 do.

여기서, 예측 정확도는 "정확하게 분류된 임상시험 사례 수/전체 임상시험 사례 수"로 나타낼 수 있으며, 예측 모델이 실제 임상시험 결과를 정확하게 추정할 확률를 의미한다. 예측 정밀도는 "성공으로 정확하게 분류된 임상시험 사례 수/예측 모델에 의해 성공으로 예측된 전체 임상시험 사례 수"로 나타낼 수 있으며, 예측 모델이 '성공'으로 예측한 케이스 중 실제로 '성공'인 케이스의 비율을 의미한다.Here, the prediction accuracy can be expressed as "the number of clinical trial cases classified correctly / the number of total clinical trial cases", and means the probability that the predictive model will accurately estimate the actual clinical trial result. Predictive precision can be expressed as "the number of clinical trial cases correctly classified as successful/total number of clinical trial cases predicted as success by the predictive model", and among the cases predicted by the predictive model as 'success', which are actually 'success' means the ratio of

도 7에 도시된 바와 같이, 예측 모델에 의해 성공으로 예측된 전체 임상시험 사례가 1042(=767+116+136+23)개이고, 성공한 실제 임상시험 사례가 767개인 경우, 예측 정밀도는 73.6%(=767/1042×100)이다. 즉, 예측 모델이 '성공'으로 예측한 임상시험의 실제 실험결과가 '성공'일 확률이 73.6% 임을 의미한다.As shown in Figure 7, when the total number of clinical trial cases predicted to be successful by the predictive model is 1042 (=767+116+136+23) and the number of successful actual clinical trial cases is 767, the prediction precision is 73.6% ( =767/1042×100). In other words, it means that the probability that the actual experimental result of the clinical trial predicted as 'success' by the predictive model is 'success' is 73.6%.

본 실시 예에서 예측 정확도 외 예측 정밀도를 성능 지수로 사용하므로 예측 모델이 '성공'으로 예측한 사례가 실제로는 실패로 판명될 위험을 보다 정확하게 관리할 수 있다.In the present embodiment, since the prediction precision other than the prediction accuracy is used as the figure of merit, the risk that the case predicted by the prediction model as 'success' will actually turn out to be a failure can be more accurately managed.

학습 모듈(151)은 산출된 성능 지수와 목표 성능 지수(target performance index)에 근거하여 각 알고리즘(AL1 및 AL2)의 파라미터를 최적화한다(S132). 학습 모듈(151)은 예측 모델의 성능 지수가 목표 성능 지수에 도달할 수 있도록 각 알고리즘(AL1 및 AL2)의 파라미터를 조정한다.The learning module 151 optimizes the parameters of each algorithm AL1 and AL2 based on the calculated performance index and target performance index (S132). The learning module 151 adjusts the parameters of each algorithm AL1 and AL2 so that the figure of merit of the predictive model can reach the target figure of merit.

예컨대, [표 2]와 같이 학습 모듈(151)은 각 알고리즘의 파라미터들을 조정하여 최적화할 수 있다.For example, as shown in Table 2, the learning module 151 may optimize by adjusting the parameters of each algorithm.

알고리즘algorithm 파라미터parameter KNN 알고리즘KNN Algorithm # of neighbors(최근접 이웃수) = 15
Weight(가중치) = "distance"# of neighbors = 15
Weight = "distance" GBM 알고리즘GBM Algorithm Learning rate(학습률) = 0.05
Subsample(서브 샘플) = 0.5 (only using 50% of the total samples when building1 tree)
max_depth(최대 깊이)= 6 (how deep each tree, to avoid overfitting)
# of estimators(트리 개수) = 40 (# of trees, to avoid overfitting)Learning rate = 0.05
Subsample = 0.5 (only using 50% of the total samples when building1 tree)
max_depth= 6 (how deep each tree, to avoid overfitting)
# of estimators = 40 (# of trees, to avoid overfitting) 신경망 알고리즘Neural Network Algorithm # hidden layers(숨겨진 레이어 수) = 2
# of neurons for every layers(모든 레이어에 대한 뉴런 수) = (64, 16)
# activation functionfor hidden layers(숨겨진 레이어의 활성화 함수) = 'relu' ('relu'는 "Rectified Linear Units"를 의미함)
# activation functionfor outputlayer (출력 레이어의 활성화 함수) = 'softmax'
Dropout = 0.2 (intentionally drop 20% of the neurons in the first layer to avoid overfitting)# hidden layers = 2
# of neurons for every layers = (64, 16)
# activation function for hidden layers = 'relu'('relu' stands for "Rectified Linear Units")
# activation functionfor outputlayer = 'softmax'
Dropout = 0.2 (intentionally drop 20% of the neurons in the first layer to avoid overfitting) 랜덤 포레스트/엑스트라 트리
알고리즘Random Forest/Extra Tree
algorithm # of estimator = 150
# of minimum samples in the leafs(리프의 최소 샘플수) = 3 (to avoid overfitting)# of estimator = 150
# of minimum samples in the leafs = 3 (to avoid overfitting) 로지스틱회귀 알고리즘logistic regression algorithm N/AN/A

학습 모듈(151)은 학습된 다수의 제1알고리즘들(AL1: AL1-1 내지 AL1-6) 및 제2학습 알고리즘(AL2)의 조합(ensemble)인 예측 모델의 성능 지수가 목표 성능 지수에 도달할 때까지 S110 내지 S130을 반복적으로 수행한다.The learning module 151 determines that the performance index of a predictive model that is a combination of a plurality of learned first algorithms (AL1: AL1-1 to AL1-6) and a second learning algorithm (AL2) reaches a target performance index. S110 to S130 are repeatedly performed until

학습 모듈(151)은 학습된 예측 모델의 성능 지수가 목표 성능 지수에 도달하면 해당 예측 모델을 최종 임상시험 결과 예측 모델로 생성한다(S140). 학습 모듈(151)은 생성된 예측 모델을 저장부(130)에 저장한다.When the performance index of the learned predictive model reaches the target performance index, the learning module 151 generates the corresponding predictive model as a final clinical trial result prediction model (S140). The learning module 151 stores the generated prediction model in the storage unit 130 .

도 8은 본 발명의 일 실시 예에 따른 임상시험 결과 예측 방법을 도시한 흐름도이고, 도 9a 내지 9c는 도 8에 도시된 각 단계 화면을 도시한 도면이다.8 is a flowchart illustrating a method for predicting a clinical trial result according to an embodiment of the present invention, and FIGS. 9A to 9C are views illustrating screens of each step shown in FIG. 8 .

도 8에 도시된 바와 같이, 예측 장치(100)의 처리부(150)은 사용자 요청에 따라 로그인(log in) 절차를 수행한다(S210). 예를 들어, 예측 장치(100)의 처리부(150)는 사용자 단말(200)의 요청에 따라 로그인을 위한 사용자 정보를 입력할 수 있는 웹 페이지(로그인 페이지)를 사용자 단말(200)로 전송한다. 사용자 단말(200)은 도 9a와 같이 웹 브라우저를 통해 로그인 페이지를 디스플레이 화면에 표시한다. 사용자는 사용자 단말(200)의 입력 수단을 조작하여 아이디 및 비밀번호를 입력하고 'sign in' 버튼을 입력한다. 사용자 단말(200)은 사용자에 의해 입력된 아이디 및 비밀번호를 예측 장치(100)로 전송한다. 예측 장치(100)의 처리부(150)는 통신부(110)를 통해 아이디 및 비밀번호를 포함한 사용자 정보를 수신하고 수신된 사용자 정보를 토대로 등록된 사용자인지를 확인하여 승인 또는 거부한다.As shown in FIG. 8 , the processing unit 150 of the prediction apparatus 100 performs a log in procedure according to a user request ( S210 ). For example, the processing unit 150 of the prediction apparatus 100 transmits a web page (login page) for inputting user information for login to the user terminal 200 according to a request from the user terminal 200 . The user terminal 200 displays a login page on the display screen through a web browser as shown in FIG. 9A . The user manipulates the input means of the user terminal 200 to input an ID and password, and inputs a 'sign in' button. The user terminal 200 transmits the ID and password input by the user to the prediction device 100 . The processing unit 150 of the prediction device 100 receives user information including an ID and password through the communication unit 110 , checks whether the user is a registered user based on the received user information, and approves or rejects it.

처리부(150)는 입력부(120)를 통해 임상시험 관련정보를 입력받는다(S220). 처리부(150)는 사용자 로그인이 완료되면 도 9b에 도시된 바와 같은 임상시험 결과 예측을 수행하고자 하는 대상 임상시험과 관련된 정보(임상시험 관련정보)를 입력할 수 있는 웹 페이지를 사용자 단말(200)에 제공한다. 사용자 단말(200)은 해당 웹 페이지를 디스플레이 화면에 표시하고 사용자에 의해 해당 웹 페이지 내 양식(form)에 임상시험 관련정보가 입력되면 입력된 임상시험 관련정보(단계, 대상 질환 및 피실험자 정보 등)를 예측 장치(100)로 전송한다. 예측 장치(100)의 입력부(120)는 통신부(110)를 통해 수신한 임상시험 관련정보를 전처리하여 처리부(150)로 전송한다.The processing unit 150 receives clinical trial-related information through the input unit 120 (S220). When the user login is completed, the processing unit 150 displays a web page for inputting information (clinical trial-related information) related to a target clinical trial for which a clinical trial result prediction is to be performed as shown in FIG. 9B to the user terminal 200 . provided to The user terminal 200 displays the corresponding web page on the display screen, and when the clinical trial-related information is input in a form within the web page by the user, the input clinical trial-related information (stage, target disease and subject information, etc.) is transmitted to the prediction device 100 . The input unit 120 of the prediction device 100 pre-processes the clinical trial related information received through the communication unit 110 and transmits it to the processing unit 150 .

처리부(150)는 사용자 단말(200)로부터 임상시험 관련정보가 입력되면 기계 학습이 완료된 예측 모델을 이용하여 임상시험 결과를 예측한다(S230). 처리부(150)는 통신부(110)를 통해 수신한 임상시험 관련정보를 입력부(120)를 거쳐 예측 모듈(152)로 전송하고 예측 모듈(152)은 저장부(130)에 저장된 예측 모델을 이용하여 임상시험 관련정보를 토대로 임상시험의 성공률을 예측한다.When the clinical trial-related information is input from the user terminal 200, the processing unit 150 predicts a clinical trial result using a predictive model on which machine learning has been completed (S230). The processing unit 150 transmits the clinical trial related information received through the communication unit 110 to the prediction module 152 through the input unit 120 , and the prediction module 152 uses the prediction model stored in the storage unit 130 . Predict the success rate of clinical trials based on clinical trial-related information.

처리부(150)는 예측된 임상시험 결과를 출력한다(S240). 처리부(150)는 예측된 임상시험 결과를 표시하는 웹 페이지를 사용자 단말(200)로 전송한다. 사용자 단말(200)은 예측 장치(100)로부터 제공받은 예측된 임상시험 결과를 표시한다. 도 9c에 도시된 바와 같이, 임상시험 결과는 성공(achieved), 불확실(inconclusive), 실패(not achieved) 및 부분 성공(partially achieved) 등 4가지 상태(status)로 구분할 수 있으며, 각 상태일 확률(64.07%, 13.57%, 20.49% 및 1.87%)로 표시된다.The processing unit 150 outputs the predicted clinical trial result (S240). The processing unit 150 transmits a web page displaying the predicted clinical trial result to the user terminal 200 . The user terminal 200 displays the predicted clinical trial result provided from the prediction device 100 . As shown in FIG. 9C , the clinical trial result can be divided into four statuses, such as achieved, inconclusive, not achieved, and partially achieved, and the probability of each status (64.07%, 13.57%, 20.49% and 1.87%).

도 10은 본 발명의 일 실시 예에 따른 임상시험 결과 예측 방법을 실행하는 컴퓨팅 시스템을 보여주는 블록도이다.10 is a block diagram illustrating a computing system for executing a clinical trial result prediction method according to an embodiment of the present invention.

도 10을 참조하면, 컴퓨팅 시스템(1000)은 버스(1200)를 통해 연결되는 적어도 하나의 프로세서(1100), 메모리(1300), 사용자 인터페이스 입력 장치(1400), 사용자 인터페이스 출력 장치(1500), 스토리지(1600), 및 네트워크 인터페이스(1700)를 포함할 수 있다. Referring to FIG. 10 , the computing system 1000 includes at least one processor 1100 , a memory 1300 , a user interface input device 1400 , a user interface output device 1500 , and storage connected through a bus 1200 . 1600 , and a network interface 1700 .

프로세서(1100)는 중앙 처리 장치(CPU) 또는 메모리(1300) 및/또는 스토리지(1600)에 저장된 명령어들에 대한 처리를 실행하는 반도체 장치일 수 있다. 메모리(1300) 및 스토리지(1600)는 다양한 종류의 휘발성 또는 불휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리(1300)는 ROM(Read Only Memory) 및 RAM(Random Access Memory)을 포함할 수 있다. The processor 1100 may be a central processing unit (CPU) or a semiconductor device that processes instructions stored in the memory 1300 and/or the storage 1600 . The memory 1300 and the storage 1600 may include various types of volatile or nonvolatile storage media. For example, the memory 1300 may include read only memory (ROM) and random access memory (RAM).

따라서, 본 명세서에 개시된 실시예들과 관련하여 설명된 방법 또는 알고리즘의 단계는 프로세서(1100)에 의해 실행되는 하드웨어, 소프트웨어 모듈, 또는 그 2 개의 결합으로 직접 구현될 수 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈형 디스크, CD-ROM과 같은 저장 매체(즉, 메모리(1300) 및/또는 스토리지(1600))에 상주할 수도 있다. 예시적인 저장 매체는 프로세서(1100)에 커플링되며, 그 프로세서(1100)는 저장 매체로부터 정보를 판독할 수 있고 저장 매체에 정보를 기입할 수 있다. 다른 방법으로, 저장 매체는 프로세서(1100)와 일체형일 수도 있다. 프로세서 및 저장 매체는 주문형 집적회로(ASIC) 내에 상주할 수도 있다. ASIC는 사용자 단말기 내에 상주할 수도 있다. 다른 방법으로, 프로세서 및 저장 매체는 사용자 단말기 내에 개별 컴포넌트로서 상주할 수도 있다.Accordingly, the steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware, a software module, or a combination of the two executed by the processor 1100 . A software module resides in a storage medium (ie, memory 1300 and/or storage 1600 ) such as RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM. You may. An exemplary storage medium is coupled to the processor 1100 , the processor 1100 capable of reading information from, and writing information to, the storage medium. Alternatively, the storage medium may be integrated with the processor 1100 . The processor and storage medium may reside within an application specific integrated circuit (ASIC). The ASIC may reside within the user terminal. Alternatively, the processor and storage medium may reside as separate components within the user terminal.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential characteristics of the present invention by those skilled in the art to which the present invention pertains.

따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

100: 예측 장치
110: 통신부
120: 입력부
130: 저장부
140: 출력부
150: 처리부100: prediction device
110: communication department
120: input unit
130: storage
140: output unit
150: processing unit

Claims

an input unit for inputting clinical trial related information; and
Process clinical trial case data to generate learning data, and perform machine learning by receiving the dataset for the primary learning stage, the dataset for the secondary learning stage, and the dataset for the test and optimization stage classified from the learning data to generate a predictive model for predicting clinical trial results, and a processing unit for predicting clinical trial results according to the clinical trial-related information using the generated predictive model,
The clinical trial case data includes clinical trial conditions and clinical trial results of actual clinical trial cases,
Probability distributions of clinical trial results of the dataset for the first learning step, the dataset for the second learning step, and the dataset for the testing and optimization step have similar shapes to each other,
The processing unit performs the machine learning by using a plurality of first learning algorithms and one second learning algorithm, and test data extracted from the learning data by the plurality of first learning algorithms and the second learning algorithm learned The performance index of the predictive model is calculated based on the clinical trial result and the actual clinical trial result predicted through the set, and the plurality of first learning algorithms and the second learning algorithm learned according to the calculated performance index of the predictive model optimize the parameters of
The second learning algorithm performs learning to determine the algorithm with the best predictive power among the plurality of first learning algorithms,
The performance index of the predictive model is a clinical trial result prediction apparatus, characterized in that it includes predictive accuracy for the entire test dataset of the predictive model and predictive accuracy for a clinical trial success case among the test dataset.

According to claim 1,
The processing unit,
A learning module for generating the predictive model by performing machine learning to determine the success rate for each clinical trial through the learning data, and
Clinical trial result prediction device including a prediction module for predicting the clinical trial success rate using the predictive model.

delete

3. The method of claim 2,
The learning module is
Clinical trial characterized in that each of the plurality of first learning algorithms performs a primary learning step of learning the relationship between the clinical trial conditions and the clinical trial results of the dataset for the primary learning stage extracted from the learning data outcome prediction device.

5. The method of claim 4,
The learning module is
In consideration of the results predicted through the clinical trial conditions for the secondary learning stage extracted from the learning data of the plurality of first learning algorithms that have been primarily learned, and the clinical trial results of the dataset for the secondary learning stage, the Clinical trial result prediction device, characterized in that performing a secondary learning step of learning the second learning algorithm.

delete

6. The method of claim 5,
The learning module is
Clinical trial result prediction apparatus, characterized in that repeatedly performing the primary learning step, the secondary learning step, and the testing and optimization steps until the performance index of the predictive model reaches a target performance index.

delete

According to claim 1,
The plurality of first learning algorithms,
K-Nearest Neighbor Algorithm, Gradient Boosting Machine Algorithm, Neural Network Algorithm, Random Forest Algorithm, Extra Trees, and Logistic Regression A clinical trial result prediction device including an algorithm.

According to claim 1,
The second learning algorithm,
Clinical trial result prediction device, characterized in that implemented by a logistic regression algorithm.

The processing unit receives the dataset for the primary learning stage, the dataset for the secondary learning stage, and the dataset for the test and optimization stage classified from the clinical trial case data, and performs machine learning to generate a predictive model that predicts the clinical trial result. creating steps,
After the processing unit generates the predictive model, receiving clinical trial related information from a user terminal, and
Comprising the step of the processing unit predicting a clinical trial result according to the clinical trial-related information using the predictive model,
The clinical trial case data includes clinical trial conditions and clinical trial results of actual clinical trial cases,
Probability distributions of clinical trial results of the dataset for the first learning step, the dataset for the second learning step, and the dataset for the testing and optimization step have similar shapes to each other,
The generating of the predictive model comprises:
performing, by the processing unit, the machine learning using a plurality of first learning algorithms and one second learning algorithm; and
The performance index of the predictive model based on the clinical trial results and actual clinical trial results predicted through the test dataset extracted from the clinical trial case data by the plurality of first learning algorithms and the second learning algorithm learned by the processing unit Comprising a test and optimization step of calculating and optimizing parameters of the plurality of first learning algorithms and the second learning algorithm according to the calculated performance index of the predictive model,
The second learning algorithm performs learning to determine the algorithm with the best predictive power among the plurality of first learning algorithms,
The performance index of the predictive model is a clinical trial result prediction method, characterized in that it includes predictive accuracy for the entire test dataset of the predictive model and predictive accuracy for a clinical trial success case among the test dataset.

12. The method of claim 11,
The generating of the predictive model comprises:
A primary learning step in which the processing unit learns a relationship between a clinical trial condition and a clinical trial result in a dataset for the primary learning stage extracted from the clinical trial case data by each of the plurality of first learning algorithms, and
The results predicted through the clinical trial conditions of the dataset for the secondary learning stage extracted from the clinical trial case data by the plurality of first learning algorithms first learned by the processing unit and the dataset for the secondary learning stage A clinical trial result prediction method comprising a secondary learning step of learning the second learning algorithm in consideration of the clinical trial result.

13. The method of claim 12,
Clinical trial result prediction method, characterized in that the processing unit repeatedly performs the primary learning step, the secondary learning step, and the testing and optimization steps until the performance index of the predictive model reaches a target performance index.

delete

13. The method of claim 12,
The prediction accuracy is
A method for predicting clinical trial results, characterized in that the proportion of clinical trial cases accurately predicted by the predictive model among all clinical trial cases.

13. The method of claim 12,
The prediction precision is
Clinical trial result prediction method, characterized in that the proportion of the clinical trial cases predicted correctly among the total clinical trial cases predicted to be successful by the predictive model.

13. The method of claim 12,
The plurality of first learning algorithms,
K-Nearest Neighbor Algorithm, Gradient Boosting Machine Algorithm, Neural Network Algorithm, Random Forest Algorithm, Extra Trees, and Logistic Regression A method for predicting clinical trial results including algorithms.

13. The method of claim 12,
The second learning algorithm,
A method for predicting clinical trial results, characterized in that it is implemented as a logistic regression algorithm.