KR20230040261A

KR20230040261A - System and method for optimizing general purpose biological network for drug response prediction using meta-reinforcement learning agent

Info

Publication number: KR20230040261A
Application number: KR1020220062120A
Authority: KR
Inventors: 조광현; 김윤성; 한영현
Original assignee: 한국과학기술원
Priority date: 2021-09-15
Filing date: 2022-05-20
Publication date: 2023-03-22

Abstract

Disclosed is a cancer treatment candidate drug determining method which comprises the following steps of: allowing a simulation device to generate a plurality of specific perturbation networks by reflecting mutation information of a first cancer cell line in each of a plurality of drug response networks for a plurality of drugs; and selecting a plurality of candidate drugs among the plurality of drugs based on a plurality of death probabilities of the first cancer cell line outputted by the plurality of specific perturbation networks.

Description

System and method for optimizing general purpose biological network for drug response prediction using meta-reinforcement learning agent}

본 발명은 세포에 대한 외부자극으로 인한 세포의 생존과 사멸을 예측하기 위한 생체신호전달 네트워크를 정의하고, 상기 생체신호전달 네트워크에 포함된 파라미터들을 최적화하는 방법을 컴퓨팅 장치를 이용하여 구현하는 기술에 관한 것이다.The present invention defines a bio-signaling network for predicting the survival and death of cells due to external stimuli to cells, and implements a method of optimizing parameters included in the bio-signaling network using a computing device. it's about

세포는 생존하거나 사멸할 수 있다. 세포에 포함된 다양한 단백질들은 서로 영향을 주면서 세포의 생존 또는 사멸에 기여할 수 있다. 세포에 포함된 한 세트의 단백질들은 각각의 발현량에 따라 다른 단백질들의 발현량에 영향을 줄 수 있다. 이러한 한 세트의 단백질들 간의 연관관계를 나타내는 유의미한 네트워크를 구성할 수 있으며, 이를 생체신호전달 네트워크 또는 생체신호전달 네트워크라고 지칭할 수 있다.Cells can either live or die. Various proteins included in cells can contribute to cell survival or death while influencing each other. A set of proteins included in a cell can affect the expression levels of other proteins according to their respective expression levels. A meaningful network representing the association between such a set of proteins can be constructed, and it can be referred to as a biosignal transduction network or a biosignal transduction network.

생체신호전달 네트워크는 노드들 및 노드들 간을 연결하는 링크들로 구성될 수 있다. 상기 각 노드는 세포 내에 존재하는 특정 단백질을 의미할 수 있다. 상기 각각의 링크에는 가중치가 할당될 있을 수 있다. 이 가중치는, 이 가중치에 대응하는 링크의 양 단부 중 제1단부에 연결된 제1노드를 나타내는 제1단백질의 발현량이 상기 양 단부 중 제2단부에 연결된 제2노드를 나타내는 제2단백질의 발현량에 미치는 영향의 정도 또는 강도를 나타낼 수 있다.The bio-signaling network may be composed of nodes and links connecting the nodes. Each node may mean a specific protein present in the cell. A weight may be assigned to each link. This weight is the expression level of the first protein representing the first node connected to the first end of both ends of the link corresponding to the weight, and the expression level of the second protein representing the second node connected to the second end of the both ends. It can indicate the degree or strength of the impact on

한 개의 세포에 대하여 다양한 생체신호전달 네트워크가 정의될 수 있다. 이 중 특정 하나의 생체신호전달 네트워크는 특정 암세포의 발현과 사멸에 특별히 연관되어 있을 수 있고, 다른 하나의 생체신호전달 네트워크는 다른 암세포의 발현과 사멸에 특히 연관되어 있을 수 있다. Various biological signal transduction networks can be defined for one cell. Among them, one specific biosignal transduction network may be specifically related to the expression and death of a specific cancer cell, and the other biosignal transduction network may be particularly related to the expression and death of other cancer cells.

상기 세포가 정상일 때에, 상기 세포에 대해 정의되는 특정 생체신호전달 네트워크를 노미널 생체신호전달 네트워크라고 지칭할 수 있다. 상기 세포의 상태는 상기 노미널 생체신호전달 네트워크의 노드들의 값의 조합에 의해 결정될 수 있다. 상기 각 노드들의 값은 상기 노미널 생체신호전달 네트워크의 타임-다이나믹스를 결정하는 상태천이 방정식에 의해 결정될 수 있다. 상기 상태천이 방정식은 상기 각 링크의 가중치에 종속될 수 있다.When the cell is normal, a specific biosignal transduction network defined for the cell may be referred to as a nominal biosignal transduction network. The state of the cell may be determined by a combination of values of nodes of the nominal biosignaling network. The value of each node may be determined by a state transition equation that determines the time-dynamics of the nominal bio-signaling network. The state transition equation may depend on the weight of each link.

상기 세포에 돌연변이가 발생하여 상기 정상 세포가 암 세포로 변한 경우, 상기 노미널 생체신호전달 네트워크 중 어떤 노드는 상기 상태천이 방정식을 따르지 않고 다른 형태의 타임-다이나믹스를 가질 수 있다. 예컨대 돌연변이가 발생한 노드는 시간이 지나도 항상 특정 값만을 가질 수도 있다. 이러한 변이된 생체신호전달 네트워크를 암세포 생체신호전달 네트워크라고 지칭할 수 있다. 상기 암세포 생체신호전달 네트워크는 상기 암 세포가 시간이 지나도 사멸되지 않도록 하는 특징을 가질 수 있다. 이때, 상기 암 세포에 특정 약물을 투여하면, 상기 특정 약물이 상기 암세포 생체신호전달 네트워크의 특정 노드의 발현량에 영향을 줄 수 있으며, 이로부터 유발된 연쇄적인 작용에 의해 상기 암 세포를 사멸하게 할 수도 있다. 이때, 상기 특정 약물은 상기 특정 노드를 섭동한다고 말할 수 있다. 그리고 상기 특정 약물은 1개의 노드를 섭동할 수도 있고, 복수 개의 노드들을 섭동할 수도 있다. 상기 암 세포를 사멸에 이르게 하는 약물을 찾아내면 암 치료에 좋은 효과를 줄 수 있다. When a mutation occurs in the cell and the normal cell is changed into a cancer cell, a certain node in the nominal bio-signaling network may have a different type of time-dynamics without following the state transition equation. For example, a node where a mutation has occurred may always have only a specific value over time. Such a mutated biosignal transduction network may be referred to as a cancer cell biosignal transduction network. The cancer cell bio-signaling network may have a feature that prevents the cancer cells from dying over time. At this time, when a specific drug is administered to the cancer cells, the specific drug can affect the expression level of a specific node of the cancer cell bio-signaling network, and the cancer cells are killed by a chain reaction induced therefrom. You may. At this time, it can be said that the specific drug perturbs the specific node. Also, the specific drug may perturb one node or a plurality of nodes. Finding a drug that kills the cancer cells can give a good effect to cancer treatment.

실제 암 세포에 다양한 약물을 순차적으로 투여하는 실험을 통해 최적의 약물을 찾아낼 수 있지만, 이 방법은 많은 비용과 시간을 요구하며, 그 사이에 암 환자의 상태는 악화될 수 있으며 궁극적으로 암 환자의 치료에 실패할 수 있다. 따라서 암 환자에게 적합한 약물을 빠른 시간 내에 찾아내기 위하여 컴퓨팅 장치를 이용한 시뮬레이션을 이용할 수 있다면 암 환자 치료에 큰 도움을 줄 수 있다. Although it is possible to find the optimal drug through an experiment in which various drugs are sequentially administered to actual cancer cells, this method requires a lot of cost and time, and in the meantime, the condition of the cancer patient may deteriorate, and ultimately, the cancer patient treatment may fail. Therefore, if a simulation using a computing device can be used to quickly find a drug suitable for cancer patients, it can be of great help in the treatment of cancer patients.

이러한 시뮬레이션 방법으로 몇 가지 방법들이 공개되어 있다. Several methods have been disclosed as such a simulation method.

이러한 선공개 기술들 중 일부는 상술한 생체신호전달 네트워크를 이용한다. 이때, 생체신호전달 네트워크의 각 링크에 부여된 가중치에 의해 그 시뮬레이션 결과의 신뢰성이 결정될 수 있다. 따라서 최적의 가중치를 찾아내는 것이 중요하다. 이러한 최적의 가중치는 시뮬레이션 방법을 고안하는 연구자의 생체신호전달 네트워크에 대한 경험 및 고찰에 의해 결정될 수 있지만, 그 한계가 존재할 수 있다는 점은 예상할 수 있는 바이다.Some of these previously disclosed technologies use the biosignal transfer network described above. At this time, reliability of the simulation result may be determined by a weight assigned to each link of the bio-signal transmission network. Therefore, it is important to find the optimal weight. These optimal weights can be determined by the experience and consideration of the bio-signaling network of the researcher who devises the simulation method, but it can be expected that there may be limitations.

따라서 본 발명에서는 상기 최적의 가중치를 결정하기 위하여 머신러닝을 이용하는 기술을 제공하고자 한다. Therefore, the present invention intends to provide a technique using machine learning to determine the optimal weight.

생체신호전달 네트워크 및 이를 이용한 타겟약물 결정방법에 관련된 선행기술은 대한민국 특허출원번호 10-2109-0100505, 10-2018-0154390, 10-2107-0044192, 10-2017-0180959, 및 10-2013-0033843 등에 제시되어 있다.Prior art related to biosignal transduction network and target drug determination method using the same are Korean Patent Application Nos. etc. are presented.

본 발명에서는, 암의 종류 및 돌연변이의 위치에 관계없이, 다양한 종류의 암 그리고 다양한 환자에 공통적으로 적용될 수 있는 암세포 생체신호전달 네트워크의 링크들에 연관되는 가중치를 결정하는 기술을 제공하고자 한다. In the present invention, it is intended to provide a technique for determining weights associated with links in a cancer cell bio-signaling network that can be commonly applied to various types of cancer and various patients regardless of the type of cancer and the location of a mutation.

본 발명에서는, 그 내부구조의 생물학적 의미를 부여할 수 있도록 모델링된 생체신호전달 네트워크의 파라미터를 머신러닝을 통해 최적화하는 기술을 제공하고자 한다. In the present invention, it is intended to provide a technique for optimizing the parameters of a modeled biological signal transduction network through machine learning so as to give biological meaning to its internal structure.

본 발명에서는 노드들 및 링크들로 구성되는 생체신호전달 네트워크에 있어서, 상기 링크들에 할당되는 가중치를 결정하는 역할을 하는 에이전트(가중치 결정 에이전트)를 학습시키는 기술을 제공하고자 한다. 또한, 학습이 완료된 상기 에이전트를 이용하여, 새로운 암환자의 치료에 적합한 약물을 선택하는 기술을 제공하고자 한다. The present invention intends to provide a technique for learning an agent (weight determination agent) that plays a role in determining weights assigned to the links in a bio-signal transfer network composed of nodes and links. In addition, it is intended to provide a technology for selecting a drug suitable for treatment of a new cancer patient by using the learned agent.

본 발명의 일 관점에 따라, 노드들 및 링크들로 구성되는 약물 반응성 네트워크에 있어서, 상기 링크들에 할당되는 가중치를 결정하는 역할을 하는 에이전트를 제공할 수 있다. 상기 약물 반응성 네트워크의 링크들에 할당된 가중치가 적절한 값으로 결정되어야 상기 생체신호전달 네트워크가 더 정확한 세포의 사멸확률을 출력할 수 있다. According to one aspect of the present invention, in a drug responsive network composed of nodes and links, an agent serving to determine weights assigned to the links may be provided. When the weights assigned to the links of the drug responsiveness network are determined to be appropriate values, the bio-signaling network can output a more accurate cell death probability.

상기 에이전트는 머신러닝 네트워크, 뉴럴 네트워크와 같은 학습 가능한 네트워크를 포함하는 것으로서, 복수 개의 레이어들을 포함하는 것일 수 있다. The agent may include a learnable network such as a machine learning network or a neural network, and may include a plurality of layers.

상기 에이전트의 학습을 위하여 한 세트의 학습용 암세포주들 그리고 복수 개의 약물(약물조합)이 학습 데이터로서 이용될 수 있다. 1회의 학습스텝을 위해 한 세트의 학습용 암세포주(N개의 암세포주) 및 1개의 약물에 관한 정보가 이용될 수 있다. For learning of the agent, a set of cancer cell lines for learning and a plurality of drugs (drug combinations) may be used as learning data. Information on one set of cancer cell lines (N cancer cell lines) and one drug for learning can be used for one learning step.

상기 한 세트의 학습용 암세포주에 상기 약물을 in vitro 실험으로 투여하였을 때에 상기 한 세트의 학습용 암세포주들의 사멸비율을 관찰하여 결정할 수 있다. 상기 사멸비율들은 예컨대 N개의 스칼라 값으로 구성된 벡터 Z로 제시될 수 있다. When the drug is administered to the set of cancer cell lines for learning in an in vitro experiment, the death rate of the set of cancer cell lines for learning can be observed and determined. The mortality rates may be presented as a vector Z consisting of N scalar values, for example.

그리고 상기 한 세트의 학습용 암세포주의 변이정보를 상기 약물 반응성 네트워크에 적용함으로써 한 세트의 스페시픽 섭동 네트워크를 생성할 수 있다. 그리고 상기 한 세트의 스페시픽 섭동 네트워크에서 얻을 수 있는 한 세트의 사멸확률을 산출할 수 있다. 상기 사멸비율들은 예컨대 N개의 스칼라 값으로 구성된 벡터 Y로 제시될 수 있다. In addition, a set of specific perturbation networks may be generated by applying mutation information of the set of cancer cell lines for learning to the drug responsiveness network. In addition, a set of death probabilities that can be obtained from the set of specific perturbation networks can be calculated. The mortality rates may be presented as, for example, a vector Y consisting of N scalar values.

본 발명의 일 관점에 따라 제공되는 리워드 계산부는 상기 에이전트에 입력되어야 하는 값인 리워드를 산출할 수 있다. 상기 리워드 계산부는 상기 벡터 Y와 상기 벡터 Z 간의 거리를 이용하여 상기 리워드를 산출할 수 있다.The reward calculation unit provided according to one aspect of the present invention may calculate a reward, which is a value to be input to the agent. The reward calculator may calculate the reward using a distance between the vector Y and the vector Z.

상기 에이전트는 상기 리워드 및 상기 약물 반응성 네트워크의 링크들에 할당되었던 가중치들을 입력 데이터로서 입력받을 수 있다. 상기 에이전트는, 위 입력 데이터를 기초로 다음 번의 학습스텝에서 상기 약물 반응성 네트워크의 링크들에 할당할 가중치들의 값을 갱신한 정보를 출력할 수 있다. The agent may receive the rewards and weights assigned to links of the drug responsiveness network as input data. The agent may output information obtained by updating values of weights to be assigned to links of the drug responsiveness network in a next learning step based on the above input data.

본 명세서에서 상기 용어 '학습스텝'은, 상기 약물 반응성 네트워크의 가중치를 갱신하는 것을 의미한다. 이에 비하여, 상기 에이전트가 1회 학습되기 위해서는 상기 학습스텝이 복수 회 실행될 필요가 있다. In the present specification, the term 'learning step' means updating the weights of the drug responsiveness network. In contrast, in order for the agent to be learned once, the learning step needs to be executed a plurality of times.

연속적으로 실행된 복수 개의 학습스텝들의 집합을 학습 에피소드라고 지칭할 수 있다. 1회의 에피소드 내의 모든 학습스텝에 대해서는 학습 데이터로서 이용되는 약물은 1개로 제한될 수 있다. 에피소드가 변경된 이후에야 학습용 약물을 변경할 수 있다. A set of a plurality of continuously executed learning steps may be referred to as a learning episode. For all learning steps within one episode, the number of drugs used as learning data may be limited to one. Only after the episode has been changed can the study drug be changed.

1회의 상기 학습스텝마다 상기 약물 반응성 네트워크의 가중치의 값들이 1회 갱신될 수 있다. 또한, 상기 에피소드가 1회 실행되면 상기 에이전트가 1회 학습될 수 있다. 상기 에피소드가 반복될 때마다 상기 에이전트의 학습량이 증가한다. Values of the weights of the drug responsiveness network may be updated once for each learning step. In addition, when the episode is executed once, the agent may be learned once. Each time the episode is repeated, the learning amount of the agent increases.

충분히 학습된 상기 에이전트는, 새로운 암세포주의 사멸을 위한 약물의 선택에 이용될 수 있다. The sufficiently learned agent can be used to select a drug for killing a new cancer cell line.

본 발명의 일 관점에 따라, 컴퓨팅 장치로 하여금, 특정 약물에 반응하는 약물 반응성 네트워크에 암세포주의 변이정보를 적용하여 생성한 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주의 사멸확률을 획득하고, 상기 암세포주에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주의 사멸비율을 획득하고, 그리고 상기 사멸확률과 상기 사멸비율 간의 차이값이 반영된 리워드 값을 산출하는 제1단계; 에이전트에 상기 리워드 값을 입력하여 상기 약물 반응성 네트워크의 링크들에 대한 새로운 가중치들을 산출하는 제2단계; 및 상기 약물 반응성 네트워크의 링크들의 가중치들을 상기 새로운 가중치들로 갱신하는 제3단계;를 포함하는 학습스텝을 실행하여, 상기 약물 반응성 네트워크의 링크들의 가중치들을 확정하는 명령어들을 포함하는 프로그램이 기록된, 컴퓨터로 읽을 수 있는 비휘발성의 기록매체가 제공될 수 있다.According to one aspect of the present invention, a computing device obtains a death probability of the cancer cell line predicted by a specific perturbation network generated by applying mutation information of the cancer cell line to a drug responsiveness network that responds to a specific drug, and the A first step of obtaining an apoptosis rate of the cancer cell line obtained by performing an in-vitro experiment in which the specific drug is administered to the cancer cell line, and calculating a reward value reflecting the difference between the apoptosis probability and the apoptosis rate; a second step of calculating new weights for the links of the drug responsiveness network by inputting the reward values into an agent; and a third step of updating the weights of the links of the drug responsiveness network with the new weights; a program including instructions for determining the weights of the links of the drug responsiveness network by executing a learning step including, A computer-readable non-volatile recording medium may be provided.

이때, 상기 프로그램은, 상기 컴퓨팅 장치로 하여금, 상기 학습스텝을 복수 회 실행하는 과정에서 획득한 복수 개의 상기 리워드들, 복수 개의 상기 새로운 가중치들을 기초로 상기 에이전트를 1회 학습시키도록 하는 명령어를 더 포함할 수 있다. At this time, the program further includes instructions for causing the computing device to learn the agent once based on the plurality of rewards obtained in the process of executing the learning step a plurality of times and the plurality of new weights. can include

이때, 상기 제1단계에서, 준비된 N개의 암세포주들 중 p번째 암세포주인 암세포주[p]의 변이정보를 상기 약물 반응성 네트워크에 적용하여 생성한 제p 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주[p]의 사멸확률[y_p]을 획득하고, 상기 암세포주[p]에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주[p]의 사멸비율[z_p]을 획득하는 단계를 상기 N개의 암세포주들 모두에 대하여 각각 실행하도록 되어 있을 수 있다(p=1, 2, 3, ... ,N). 그리고 상기 제1단계는, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸확률들로 구성된 벡터 Y와, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸비율들로 구성된 벡터 Z간의 거리에 반비례하는 값인 제1값을 기초로 상기 리워드 값을 산출하는 단계;를 포함할 수 있다.At this time, in the first step, the cancer cell predicted by the pth specific perturbation network generated by applying the mutation information of the pth cancer cell line [p] among the prepared N cancer cell lines to the drug responsiveness network Obtaining the death probability [y _p ] of the strain [p], and performing an in-vitro experiment in which the specific drug is administered to the cancer cell line [p], the death rate [z _p ] of the cancer cell line [p] The acquiring step may be performed for each of the N cancer cell lines (p=1, 2, 3, ..., N). And the first step is a value inversely proportional to the distance between the vector Y consisting of the death probabilities obtained for the N cancer cell lines and the vector Z consisting of the death rates obtained for the N cancer cell lines. Calculating the reward value based on 1 value; may include.

이때, 상기 제1단계에서, 준비된 N개의 암세포주들 중 p번째 암세포주인 암세포주[p]의 변이정보를 상기 약물 반응성 네트워크에 적용하여 생성한 제p 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주[p]의 사멸확률[y_p]을 획득하고, 상기 암세포주[p]에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주[p]의 사멸비율[z_p]을 획득하는 단계를 상기 N개의 암세포주들 모두에 대하여 각각 실행하도록 되어 있을 수 있다(p=1, 2, 3, ... ,N). 그리고 상기 제1단계는, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸확률들로 구성된 벡터 Y와, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸비율들로 구성된 벡터 Z 간의 거리에 반비례하는 값인 제1값을 산출하는 단계; 및 상기 제1값과 미리 준비된 제2값 간의 차이값을 기초로 상기 리워드 값을 산출하는 단계;를 포함할 수 있다. 이때, 상기 제2값은, 상기 제1값을 산출하기 직전에 이미 완료된 과거의 상기 학습스텝에서 구성한 과거의 벡터 Y와 과거의 벡터 Z 간의 거리에 반비례하는 값일 수 있다.At this time, in the first step, the cancer cell predicted by the pth specific perturbation network generated by applying the mutation information of the pth cancer cell line [p] among the prepared N cancer cell lines to the drug responsiveness network Obtaining the death probability [y _p ] of the strain [p], and performing an in-vitro experiment in which the specific drug is administered to the cancer cell line [p], the death rate [z _p ] of the cancer cell line [p] The acquiring step may be performed for each of the N cancer cell lines (p=1, 2, 3, ..., N). And the first step is a value inversely proportional to the distance between the vector Y consisting of the death probabilities obtained for the N cancer cell lines and the vector Z consisting of the death rates obtained for the N cancer cell lines. Calculating a value of 1; and calculating the reward value based on a difference between the first value and a previously prepared second value. In this case, the second value may be a value inversely proportional to the distance between the past vector Y and the past vector Z constructed in the past learning step that has already been completed immediately before calculating the first value.

본 발명의 다른 관점에 따라, 처리부 및 저장부를 포함하는 컴퓨팅 장치가 제공될 수 있다. 상기 처리부는, 특정 약물에 반응하는 약물 반응성 네트워크의 링크들의 가중치를 결정하는 에이전트를 학습시키는 프로세스인 에피소드를 수행하는 단계;를 실행하도록 되어 있다. 상기 처리부는 한 개의 상기 에피소드를 수행하는 단계는 소정의 학습스텝을 복수 회 실행하는 단계를 실행하도록 되어 있다. 상기 학습스텝은, 상기 약물 반응성 네트워크에 암세포주의 변이정보를 적용하여 생성한 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주의 사멸확률을 획득하고, 상기 암세포주에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주의 사멸비율을 획득하고, 그리고 상기 사멸확률과 상기 사멸비율 간의 차이값이 반영된 리워드 값을 산출하는 제1단계; 상기 에이전트에 상기 리워드 값을 입력하여 상기 약물 반응성 네트워크의 링크들에 대한 새로운 가중치들을 산출하는 제2단계; 및 상기 약물 반응성 네트워크의 링크들의 가중치들을 상기 새로운 가중치들로 갱신하는 제3단계;를 포함한다. According to another aspect of the present invention, a computing device including a processing unit and a storage unit may be provided. The processing unit is configured to execute an episode, which is a process of learning an agent to determine weights of links of a drug responsiveness network that respond to a specific drug. The processing unit is adapted to execute the step of executing the one episode step of executing a predetermined learning step a plurality of times. In the learning step, the death probability of the cancer cell line predicted by the specific perturbation network generated by applying the mutation information of the cancer cell line to the drug responsiveness network is obtained, and the specific drug is administered to the cancer cell line In-vitro A first step of obtaining an apoptosis rate of the cancer cell line obtained by performing an experiment, and calculating a reward value in which the difference between the apoptosis probability and the apoptosis rate is reflected; a second step of calculating new weights for links of the drug responsiveness network by inputting the reward value to the agent; and a third step of updating the weights of the links of the drug responsiveness network with the new weights.

이때, 상기 제1단계에서, 준비된 N개의 암세포주들 중 p번째 암세포주인 암세포주[p]의 변이정보를 상기 약물 반응성 네트워크에 적용하여 생성한 제p 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주[p]의 사멸확률[y_p]을 획득하고, 상기 암세포주[p]에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주[p]의 사멸비율[z_p]을 획득하는 단계를 상기 N개의 암세포주들 모두에 대하여 각각 실행하도록 되어 있을 수 있다(p=1, 2, 3, ... ,N). 상기 제1단계는, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸확률들로 구성된 벡터 Y와, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸비율들로 구성된 벡터 Z간의 거리에 반비례하는 값인 제1값을 기초로 상기 리워드 값을 산출하는 단계;를 포함할 수 있다.At this time, in the first step, the cancer cell predicted by the pth specific perturbation network generated by applying the mutation information of the pth cancer cell line [p] among the prepared N cancer cell lines to the drug responsiveness network Obtaining the death probability [y _p ] of the strain [p], and performing an in-vitro experiment in which the specific drug is administered to the cancer cell line [p], the death rate [z _p ] of the cancer cell line [p] The acquiring step may be performed for each of the N cancer cell lines (p=1, 2, 3, ..., N). The first step is a first step, which is a value inversely proportional to the distance between the vector Y consisting of the death probabilities obtained for the N cancer cell lines and the vector Z consisting of the death rates obtained for the N cancer cell lines. Calculating the reward value based on the value; may include.

이때, 상기 제1단계는, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸확률들로 구성된 벡터 Y와, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸비율들로 구성된 벡터 Z 간의 거리에 반비례하는 값인 제1값을 산출하는 단계; 및 상기 제1값과 미리 준비된 제2값 간의 차이값을 기초로 상기 리워드 값을 산출하는 단계;를 포함할 수 있다. 이때, 상기 제2값은, 상기 제1값을 산출하기 직전에 이미 완료된 과거의 상기 학습스텝에서 구성한 과거의 벡터 Y와 과거의 벡터 Z 간의 거리에 반비례하는 값일 수 있다.At this time, the first step is a value inversely proportional to the distance between the vector Y consisting of the death probabilities obtained for the N cancer cell lines and the vector Z consisting of the death rates obtained for the N cancer cell lines. calculating a first value; and calculating the reward value based on a difference between the first value and a previously prepared second value. In this case, the second value may be a value inversely proportional to the distance between the past vector Y and the past vector Z constructed in the past learning step that has already been completed immediately before calculating the first value.

본 발명의 일 관점에 따라, 컴퓨팅 장치가, 소정의 학습스텝을 실행하는 단계를 포함하는 생체신호전달 네트워크의 생성방법이 제공될 수 있다. 상기 학습스텝은, 상기 컴퓨팅 장치가, 특정 약물에 반응하는 약물 반응성 네트워크에 암세포주의 변이정보를 적용하여 생성한 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주의 사멸확률을 획득하고, 상기 암세포주에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주의 사멸비율을 획득하고, 그리고 상기 사멸확률과 상기 사멸비율 간의 차이값이 반영된 리워드 값을 산출하는 제1단계; 상기 컴퓨팅 장치가, 에이전트에 상기 리워드 값을 입력하여 상기 약물 반응성 네트워크의 링크들에 대한 새로운 가중치들을 산출하는 제2단계; 및 컴퓨팅 장치가, 상기 약물 반응성 네트워크의 링크들의 가중치들을 상기 새로운 가중치들로 갱신하는 제3단계를 포함한다. According to one aspect of the present invention, a method for generating a bio-signal transfer network including executing a predetermined learning step by a computing device may be provided. In the learning step, the computing device acquires the death probability of the cancer cell line predicted by the specific perturbation network generated by applying mutation information of the cancer cell line to the drug responsiveness network that responds to a specific drug, and A first step of obtaining a death rate of the cancer cell line obtained by performing an in-vitro experiment in which the specific drug is administered, and calculating a reward value in which the difference between the death probability and the death rate is reflected; a second step of calculating, by the computing device, new weights for links of the drug responsiveness network by inputting the reward value to an agent; and a third step of updating, by a computing device, weights of links of the drug responsiveness network to the new weights.

이때, 상기 학습스텝은 반복적으로 실행될 수 있다. 그리고 상기 제1단계에서, 준비된 N개의 암세포주들 중 p번째 암세포주인 암세포주[p]의 변이정보를 상기 약물 반응성 네트워크에 적용하여 생성한 제p 스페시픽 섭동 네트워크에 의해 예측된 상기 암세포주[p]의 사멸확률[y_p]을 획득하고, 상기 암세포주[p]에 상기 특정 약물을 투여하는 in-vitro 실험을 수행하여 얻은 상기 암세포주[p]의 사멸비율[z_p]을 획득하는 단계를 상기 N개의 암세포주들 모두에 대하여 각각 실행하도록 되어 있을 수 있다(p=1, 2, 3, ... ,N). 이때, 상기 제1단계는, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸확률들로 구성된 벡터 Y와, 상기 N개의 암세포주들에 대하여 얻은 상기 사멸비율들로 구성된 벡터 Z 간의 거리에 반비례하는 값인 제1값을 산출하는 단계; 및 상기 제1값과 미리 준비된 제2값 간의 차이값을 기초로 상기 리워드 값을 산출하는 단계;를 포함할 수 있다. 이때, 상기 제2값은, 상기 제1값을 산출하기 직전에 이미 완료된 과거의 상기 학습스텝에서 구성한 과거의 벡터 Y와 과거의 벡터 Z 간의 거리에 반비례하는 값일 수 있다. At this time, the learning step may be repeatedly executed. And in the first step, the cancer cell line predicted by the p-th specific perturbation network generated by applying the mutation information of the p-th cancer cell line [p] among the prepared N cancer cell lines to the drug responsiveness network Obtain the death probability [y _p ] of [p] and obtain the death rate [ _z p ] of the cancer cell line [p] obtained by performing an in-vitro experiment in which the specific drug is administered to the cancer cell line [p] The step of doing may be performed for each of the N cancer cell lines (p = 1, 2, 3, ..., N). At this time, the first step is a value inversely proportional to the distance between the vector Y consisting of the death probabilities obtained for the N cancer cell lines and the vector Z consisting of the death rates obtained for the N cancer cell lines. calculating a first value; and calculating the reward value based on a difference between the first value and a previously prepared second value. In this case, the second value may be a value inversely proportional to the distance between the past vector Y and the past vector Z constructed in the past learning step that has already been completed immediately before calculating the first value.

본 발명의 일 관점에 따라, 시뮬레이션 장치(810)가, 복수 개의 약물들에 대한 복수 개의 약물 반응성 네트워크들에 각각 제1암세포주의 변이정보를 반영하여 복수 개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 시뮬레이션 장치가, 상기 복수 개의 스페시픽 섭동 네트워크들이 출력한 상기 제1암세포주에 관한 복수 개의 사멸확률들을 기초로, 상기 복수 개의 약물들 중 복수 개의 후보 약물을 선택하는 단계; 상기 결정된 복수 개의 후보 약물에 관한 정보를 약물 반응 스크리닝 장치에게 제공하는 단계; 상기 약물 반응 스크리닝 장치가, 상기 복수 개의 후보 약물을 상기 제1암세포주가 저장되어 있는 복수 개의 웰(well)들에 투여하는 in-vitro 실험을 실행하는 단계; 상기 약물 반응 스크리닝 장치가, 세포이미지 촬영장치를 이용하여 상기 복수 개의 웰들에서의 상기 제1암세포주의 이미지를 촬영하여 분석하는 단계; 및 상기 약물 반응 스크리닝 장치가, 상기 분석한 결과를 기초로 상기 복수 개의 후보 약물 중 적어도 일부에 대한 in vitro 실험결과를 출력하는 단계;를 포함하는, 암치료 후보 약물 결정방법이 제공될 수 있다. According to one aspect of the present invention, generating a plurality of specific perturbation networks by reflecting mutation information of a first cancer cell line to a plurality of drug responsiveness networks for a plurality of drugs, respectively, by the simulation apparatus 810; selecting, by the simulation device, a plurality of candidate drugs from among the plurality of drugs based on a plurality of death probabilities of the first cancer cell line output from the plurality of specific perturbation networks; providing information on the determined plurality of candidate drugs to a drug response screening device; executing, by the drug response screening device, an in-vitro experiment in which the plurality of candidate drugs are administered to a plurality of wells in which the first cancer cell line is stored; capturing and analyzing, by the drug response screening device, images of the first cancer cell line in the plurality of wells using a cell image capturing device; and outputting, by the drug reaction screening device, in vitro test results for at least some of the plurality of candidate drugs based on the analyzed results; a method for determining candidate drugs for cancer treatment may be provided.

이때, 상기 생성하는 단계 이전에, 컴퓨팅 장치(710)가, 상기 복수 개의 약물 반응성 네트워크들 중 제k 약물에 반응하는 제k 약물 반응성 네트워크의 가중치를 결정하는 프로세스(=에피소드)를 수행하는 단계를 더 포함할 수 있다. 그리고 상기 프로세스를 수행하는 단계는, 강화 학습에 의해 학습이 완료된 에이전트를 이용하도록 되어 있을 수 있다. 이때, 상기 프로세스를 수행하는 단계는, 상기 컴퓨팅 장치가, 상기 복수 개의 약물들 중 제k 약물을 이용한 in vitro 실험에 의한 반응성에 관한 정보가 존재하는 N(=p_k) 개의 셀라인들의 변이정보들을 획득하는 단계; 상기 컴퓨팅 장치가, N개의 상기 변이정보들을 상기 제k 약물에 반응하는 상기 제k 약물 반응성 네트워크에 적용하여, N개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 컴퓨팅 장치가, 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 갱신하는 학습스텝을 상기 에이전트를 이용하여 복수 회 반복하여 수행하는 단계; 상기 컴퓨팅 장치가, 상기 복수 회의 학습스텝들 중 상기 에이전트에 제공되는 리워드가 가장 큰 값을 갖는 학습스텝을 선택하는 단계; 및 상기 컴퓨팅 장치가, 상기 선택된 학습스텝에서 상기 에이전트가 출력한 링크 가중치들을 상기 제k 약물 반응성 네트워크의 링크들의 가중치인 것으로 확정하는 단계;를 포함할 수 있다.At this time, prior to the generating step, the computing device 710 performs a process (=episode) of determining the weight of the kth drug responsiveness network that responds to the kth drug among the plurality of drug responsiveness networks. can include more. In addition, the step of performing the process may be configured to use an agent whose learning has been completed by reinforcement learning. At this time, the step of performing the process may include, by the computing device, mutation information of N (= p _k ) cell lines having information on reactivity by an in vitro experiment using the kth drug among the plurality of drugs. obtaining them; generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug; repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent; selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network.

이때, 상기 에이전트는 상기 리워드와 현재 학습스텝에서의 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 기초로, 다음 학습스텝에서의 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 결정하도록 되어 있을 수 있다.In this case, the agent may be configured to determine the weights of the links of the kth drug responsiveness network in the next learning step based on the reward and the weights of the links of the kth drug responsiveness network in the current learning step.

이때, 상기 리워드를 결정하는 프로세스는, 상기 컴퓨팅 장치가, 상기 복수 회의 학습스텝들 중 현재 학습스텝에서, 상기 N개의 스페시픽 섭동 네트워크들이 출력한 N개의 사멸확률들로 구성된 벡터 Y와, 상기 제k 약물을 상기 제1암세포주에 투여하는 in vitro 실험에 의해 관찰된 N개의 상기 제1암세포주의 사멸비율에 관한 값들로 구성된 벡터 Z를 준비하는 단계; 상기 컴퓨팅 장치가, 상기 벡터 Y와 상기 벡터 Z 간의 거리에 반비례하는 제1값을 산출하는 단계; 및 상기 컴퓨팅 장치가, 상기 제1값과 제2값 간의 차이값을 기초로 상기 리워드를 산출하는 단계;를 포함할 수 있다. 그리고 상기 제2값은, 상기 현재 학습스텝의 직전 학습스텝에서 준비한 상기 벡터 Y와 상기 벡터 Z 간의 거리에 반비례하는 값일 수 있다.At this time, in the process of determining the reward, the computing device, in the current learning step among the plurality of learning steps, a vector Y consisting of N death probabilities output by the N specific perturbation networks, and the preparing a vector Z composed of values related to the death rate of the N first cancer cell lines observed by an in vitro experiment in which a k-th drug is administered to the first cancer cell line; calculating, by the computing device, a first value that is inversely proportional to a distance between the vector Y and the vector Z; and calculating, by the computing device, the reward based on a difference between the first value and the second value. The second value may be a value inversely proportional to a distance between the vector Y and the vector Z prepared in a learning step immediately before the current learning step.

이때, 상기 제k 약물 반응성 네트워크의 가중치를 결정하는 프로세스(=에피소드)를 수행하는 단계 이전에, 상기 컴퓨팅 장치(710)가 상기 에이전트를 학습시키는 단계를 더 포함할 수 있다. 그리고 상기 에이전트의 학습시키는 단계에서, 상기 에이전트를 학습시키는 프로세스(=에피소드)를 서로 다른 G 개의 약물에 대하여 반복하여 수행하도록 되어 있을 수 있다. 이때, 제g 약물에 대하여 수행되는 상기 에이전트의 학습 프로세스는, 상기 컴퓨팅 장치(710)가, 상기 제g 약물을 이용한 in vitro 실험에 의한 반응성에 관한 정보가 존재하는 p_g 개의 셀라인들의 변이정보들을 획득하는 단계; 상기 컴퓨팅 장치가, p_g 개의 상기 변이정보들을 상기 제p 약물에 반응하는 상기 제p 약물 반응성 네트워크에 적용하여, p_g 개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 컴퓨팅 장치가, 상기 제g 약물 반응성 네트워크의 링크들의 가중치를 갱신하는 학습스텝을 상기 에이전트를 이용하여 복수 회 반복하여 수행하는 단계; 및 상기 컴퓨팅 장치가, 상기 학습스텝을 상기 복수 회 반복하여 수행하는 과정에서 획득한 상기 리워드 값들과 상기 가중치들을 이용하여 상기 에이전트를 학습시키는 단계;를 포함할 수 있다.In this case, prior to the step of determining the weight of the kth drug responsiveness network (=episode), the computing device 710 may further include training the agent. In the step of learning the agent, the process of learning the agent (=episode) may be repeatedly performed for G different drugs. At this time, in the learning process of the agent performed for the g-th drug, the computing device 710 provides mutation information of p _g cell lines having information on reactivity by an in vitro experiment using the g-th drug. obtaining them; generating, _by the computing device, p g specific perturbation networks by applying the p _g pieces of the mutation information to the p th drug responsiveness network that responds to the p th drug; repeating, by the computing device, a learning step of updating weights of links of the g-th drug responsiveness network a plurality of times using the agent; and learning, by the computing device, the agent using the reward values and the weights obtained in the process of repeating the learning step a plurality of times.

이때, 상기 에이전트는 상기 리워드와 현재 학습스텝에서의 상기 제g 약물 반응성 네트워크의 링크들의 가중치를 기초로, 다음 학습스텝에서의 상기 제g 약물 반응성 네트워크의 링크들의 가중치를 결정하도록 되어 있을 수 있다.In this case, the agent may be configured to determine the weights of the links of the gth drug responsiveness network in the next learning step based on the reward and the weights of the links of the gth drug responsiveness network in the current learning step.

본 발명의 다른 관점에 따라, 컴퓨팅 장치(710)가, 상기 복수 개의 약물 반응성 네트워크들 중 제k 약물에 반응하는 제k 약물 반응성 네트워크의 가중치를 결정하는 프로세스(=에피소드)를 수행하는 단계; 시뮬레이션 장치(810)가, 복수 개의 약물들에 대한 복수 개의 약물 반응성 네트워크들에 각각 제1암세포주의 변이정보를 반영하여 복수 개의 스페시픽 섭동 네트워크들을 생성하는 단계; 및 상기 시뮬레이션 장치가, 상기 복수 개의 스페시픽 섭동 네트워크들이 출력한 상기 제1암세포주에 관한 복수 개의 사멸확률들을 기초로, 상기 복수 개의 약물들 중 복수 개의 후보 약물을 선택하는 단계;를 포함하는 \,암치료 후보 약물 결정방법이 제공될 수 있다. 이때, 상기 프로세스를 수행하는 단계는, 강화 학습에 의해 학습이 완료된 에이전트를 이용하도록 되어 있다. 그리고 상기 프로세스를 수행하는 단계는, 상기 컴퓨팅 장치가, 상기 복수 개의 약물들 중 제k 약물을 이용한 in vitro 실험에 의한 반응성에 관한 정보가 존재하는 N(=p_k) 개의 셀라인들의 변이정보들을 획득하는 단계; 상기 컴퓨팅 장치가, N개의 상기 변이정보들을 상기 제k 약물에 반응하는 상기 제k 약물 반응성 네트워크에 적용하여, N개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 컴퓨팅 장치가, 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 갱신하는 학습스텝을 상기 에이전트를 이용하여 복수 회 반복하여 수행하는 단계; 상기 컴퓨팅 장치가, 상기 복수 회의 학습스텝들 중 상기 에이전트에 제공되는 리워드가 가장 큰 값을 갖는 학습스텝을 선택하는 단계; 및 상기 컴퓨팅 장치가, 상기 선택된 학습스텝에서 상기 에이전트가 출력한 링크 가중치들을 상기 제k 약물 반응성 네트워크의 링크들의 가중치인 것으로 확정하는 단계;를 포함한다. According to another aspect of the present invention, performing, by a computing device 710, a process (=episode) of determining a weight of a kth drug responsiveness network that responds to a kth drug among the plurality of drug responsiveness networks; generating, by the simulation device 810, a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line in each of the plurality of drug response networks for the plurality of drugs; and selecting, by the simulation device, a plurality of candidate drugs from among the plurality of drugs based on a plurality of death probabilities of the first cancer cell line output by the plurality of specific perturbation networks. \,Cancer treatment candidate drug determination method can be provided. At this time, the step of performing the process is to use an agent whose learning has been completed by reinforcement learning. In the step of performing the process, the computing device retrieves mutation information of N (= p _k ) cell lines having information on reactivity by an in vitro experiment using the kth drug among the plurality of drugs. obtaining; generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug; repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent; selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network.

본 발명의 다른 관점에 따라, 시뮬레이션 장치(810); 약물 반응 스크리닝 장치(600); 및 컴퓨팅 장치(710);를 포함하는 암치료 후보 약물 결정 시스템이 제공될 수 있다. 상기 시뮬레이션 장치는, 복수 개의 약물들에 대한 복수 개의 약물 반응성 네트워크들에 각각 제1암세포주의 변이정보를 반영하여 복수 개의 스페시픽 섭동 네트워크들을 생성하고, 상기 복수 개의 스페시픽 섭동 네트워크들이 출력한 상기 제1암세포주에 관한 복수 개의 사멸확률들을 기초로, 상기 복수 개의 약물들 중 복수 개의 후보 약물을 선택하고, 그리고 상기 결정된 복수 개의 후보 약물에 관한 정보를 약물 반응 스크리닝 장치에게 제공하도록 되어 있다. 그리고 상기 약물 반응 스크리닝 장치는, 상기 복수 개의 후보 약물을 상기 제1암세포주가 저장되어 있는 복수 개의 웰(well)들에 투여하는 in-vitro 실험을 실행하고, 상기 약물 반응 스크리닝 장치가, 세포이미지 촬영장치를 이용하여 상기 복수 개의 웰들에서의 상기 제1암세포주의 이미지를 촬영하여 분석하고, 그리고 상기 약물 반응 스크리닝 장치가, 상기 분석한 결과를 기초로 상기 복수 개의 후보 약물 중 적어도 일부에 대한 in vitro 실험결과를 출력하도록 되어 있다. According to another aspect of the present invention, the simulation device 810; drug response screening device 600; And a computing device 710; cancer treatment candidate drug determination system comprising a; can be provided. The simulation device generates a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line in each of a plurality of drug reactivity networks for a plurality of drugs, and outputs the plurality of specific perturbation networks. Based on the plurality of death probabilities of the first cancer cell line, a plurality of candidate drugs are selected from among the plurality of drugs, and information on the determined plurality of candidate drugs is provided to the drug response screening device. And, the drug response screening device executes an in-vitro experiment in which the plurality of candidate drugs are administered to a plurality of wells in which the first cancer cell line is stored, and the drug response screening device takes a cell image. An image of the first cancer cell line in the plurality of wells is captured and analyzed using a device, and the drug response screening device performs an in vitro experiment on at least some of the plurality of candidate drugs based on the analysis result. It is supposed to output the result.

이때, 상기 컴퓨팅 장치는, 상기 시뮬레이션 장치가 상기 복수 개의 스페시픽 섭동 네트워크들을 생성하기 이전에, 상기 복수 개의 약물 반응성 네트워크들 중 제k 약물에 반응하는 제k 약물 반응성 네트워크의 가중치를 결정하는 프로세스(=에피소드)를 수행하도록 되어 있을 수 있다. 그리고 상기 프로세스를 수행하는 과정은, 강화 학습에 의해 학습이 완료된 에이전트를 이용하도록 되어 있을 수 있다. 그리고 상기 프로세스를 수행하는 과정은, 상기 컴퓨팅 장치가, 상기 복수 개의 약물들 중 제k 약물을 이용한 in vitro 실험에 의한 반응성에 관한 정보가 존재하는 N(=p_k) 개의 셀라인들의 변이정보들을 획득하는 단계; 상기 컴퓨팅 장치가, N개의 상기 변이정보들을 상기 제k 약물에 반응하는 상기 제k 약물 반응성 네트워크에 적용하여, N개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 컴퓨팅 장치가, 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 갱신하는 학습스텝을 상기 에이전트를 이용하여 복수 회 반복하여 수행하는 단계; 상기 컴퓨팅 장치가, 상기 복수 회의 학습스텝들 중 상기 에이전트에 제공되는 리워드가 가장 큰 값을 갖는 학습스텝을 선택하는 단계; 및 상기 컴퓨팅 장치가, 상기 선택된 학습스텝에서 상기 에이전트가 출력한 링크 가중치들을 상기 제k 약물 반응성 네트워크의 링크들의 가중치인 것으로 확정하는 단계;를 포함할 수 있다.At this time, the computing device determines the weight of the kth drug responsiveness network that responds to the kth drug among the plurality of drug responsiveness networks before the simulation device generates the plurality of specific perturbation networks. (=episode). Further, the process of performing the process may be configured to use an agent whose learning has been completed by reinforcement learning. In the process of performing the process, the computing device retrieves mutation information of N (= p _k ) cell lines having information on reactivity by an in vitro experiment using the kth drug among the plurality of drugs. obtaining; generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug; repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent; selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network.

이때, 상기 에이전트는 상기 리워드와 현재 학습스텝에서의 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 기초로, 다음 학습스텝에서의 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 결정하도록 되어 있을 수 있다. In this case, the agent may be configured to determine the weights of the links of the kth drug responsiveness network in the next learning step based on the reward and the weights of the links of the kth drug responsiveness network in the current learning step.

이때, 상기 리워드를 결정하는 프로세스는, 상기 컴퓨팅 장치가, 상기 복수 회의 학습스텝들 중 현재 학습스텝에서, 상기 N개의 스페시픽 섭동 네트워크들이 출력한 N개의 사멸확률들로 구성된 벡터 Y와, 상기 제k 약물을 상기 제1암세포주에 투여하는 in vitro 실험에 의해 관찰된 N개의 상기 제1암세포주의 사멸비율에 관한 값들로 구성된 벡터 Z를 준비하는 과정; 상기 컴퓨팅 장치가, 상기 벡터 Y와 상기 벡터 Z 간의 거리에 반비례하는 제1값을 산출하는 과정; 및 상기 컴퓨팅 장치가, 상기 제1값과 제2값 간의 차이값을 기초로 상기 리워드를 산출하는 과정;을 포함할 수 있다. 이때 상기 제2값은, 상기 현재 학습스텝의 직전 학습스텝에서 준비한 상기 벡터 Y와 상기 벡터 Z 간의 거리에 반비례하는 값일 수 있다.At this time, in the process of determining the reward, the computing device, in the current learning step among the plurality of learning steps, a vector Y consisting of N death probabilities output by the N specific perturbation networks, and the preparing a vector Z consisting of values related to the death rate of the N first cancer cell lines observed by an in vitro experiment in which k-th drug is administered to the first cancer cell line; calculating, by the computing device, a first value that is inversely proportional to the distance between the vector Y and the vector Z; and calculating, by the computing device, the reward based on a difference between the first value and the second value. In this case, the second value may be a value inversely proportional to a distance between the vector Y and the vector Z prepared in a learning step immediately before the current learning step.

이때, 상기 제k 약물 반응성 네트워크의 가중치를 결정하는 프로세스(=에피소드)를 수행하기 이전에, 상기 컴퓨팅 장치(710)가 상기 에이전트를 학습시키도록 되어 있을 수 있다. 그리고 상기 에이전트의 학습시키는 과정에서, 상기 에이전트를 학습시키는 프로세스(=에피소드)를 서로 다른 G 개의 약물에 대하여 반복하여 수행하도록 되어 있을 수 있다. 이때, 제g 약물에 대하여 수행되는 상기 에이전트의 학습 프로세스는, 상기 컴퓨팅 장치(710)가, 상기 제g 약물을 이용한 in vitro 실험에 의한 반응성에 관한 정보가 존재하는 p_g 개의 셀라인들의 변이정보들을 획득하는 단계; 상기 컴퓨팅 장치가, p_g 개의 상기 변이정보들을 상기 제p 약물에 반응하는 상기 제p 약물 반응성 네트워크에 적용하여, p_g 개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 컴퓨팅 장치가, 상기 제g 약물 반응성 네트워크의 링크들의 가중치를 갱신하는 학습스텝을 상기 에이전트를 이용하여 복수 회 반복하여 수행하는 단계; 및 상기 컴퓨팅 장치가, 상기 학습스텝을 상기 복수 회 반복하여 수행하는 과정에서 획득한 상기 리워드 값들과 상기 가중치들을 이용하여 상기 에이전트를 학습시키는 단계;를 포함할 수 있다.In this case, before performing the process (=episode) of determining the weights of the kth drug responsiveness network, the computing device 710 may be configured to train the agent. In addition, in the process of learning the agent, the process of learning the agent (=episode) may be repeatedly performed for G different drugs. At this time, in the learning process of the agent performed for the g-th drug, the computing device 710 provides mutation information of p _g cell lines having information on reactivity by an in vitro experiment using the g-th drug. obtaining them; generating, _by the computing device, p g specific perturbation networks by applying the p _g pieces of the mutation information to the p th drug responsiveness network that responds to the p th drug; repeating, by the computing device, a learning step of updating weights of links of the g-th drug responsiveness network a plurality of times using the agent; and learning, by the computing device, the agent using the reward values and the weights obtained in the process of repeating the learning step a plurality of times.

본 발명의 또 다른 관점에 따라 시뮬레이션 장치(810); 약물 반응 스크리닝 장치(600); 및 컴퓨팅 장치(710);를 포함하는 암치료 후보 약물 결정 시스템이 제공될 수 있다. 이때, 상기 컴퓨팅 장치는, 상기 복수 개의 약물 반응성 네트워크들 중 제k 약물에 반응하는 제k 약물 반응성 네트워크의 가중치를 결정하는 프로세스(=에피소드)를 수행하도록 되어 있다. 그리고 상기 시뮬레이션 장치는, 복수 개의 약물들에 대한 복수 개의 약물 반응성 네트워크들에 각각 제1암세포주의 변이정보를 반영하여 복수 개의 스페시픽 섭동 네트워크들을 생성하고, 그리고 상기 복수 개의 스페시픽 섭동 네트워크들이 출력한 상기 제1암세포주에 관한 복수 개의 사멸확률들을 기초로, 상기 복수 개의 약물들 중 복수 개의 후보 약물을 선택하도록 되어 있고, 상기 프로세스를 수행하는 단계는, 강화 학습에 의해 학습이 완료된 에이전트를 이용하도록 되어 있다. 그리고 상기 프로세스를 수행하는 단계는, 상기 컴퓨팅 장치가, 상기 복수 개의 약물들 중 제k 약물을 이용한 in vitro 실험에 의한 반응성에 관한 정보가 존재하는 N(=p_k) 개의 셀라인들의 변이정보들을 획득하는 단계; 상기 컴퓨팅 장치가, N개의 상기 변이정보들을 상기 제k 약물에 반응하는 상기 제k 약물 반응성 네트워크에 적용하여, N개의 스페시픽 섭동 네트워크들을 생성하는 단계; 상기 컴퓨팅 장치가, 상기 제k 약물 반응성 네트워크의 링크들의 가중치를 갱신하는 학습스텝을 상기 에이전트를 이용하여 복수 회 반복하여 수행하는 단계; 상기 컴퓨팅 장치가, 상기 복수 회의 학습스텝들 중 상기 에이전트에 제공되는 리워드가 가장 큰 값을 갖는 학습스텝을 선택하는 단계; 및 상기 컴퓨팅 장치가, 상기 선택된 학습스텝에서 상기 에이전트가 출력한 링크 가중치들을 상기 제k 약물 반응성 네트워크의 링크들의 가중치인 것으로 확정하는 단계;를 포함한다.Simulation device 810 according to another aspect of the present invention; drug response screening device 600; And a computing device 710; cancer treatment candidate drug determination system comprising a; can be provided. At this time, the computing device is configured to perform a process (=episode) of determining a weight of a kth drug responsiveness network that responds to a kth drug among the plurality of drug responsiveness networks. The simulation device generates a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line in each of a plurality of drug reactivity networks for a plurality of drugs, and the plurality of specific perturbation networks A plurality of candidate drugs are selected from among the plurality of drugs based on the plurality of death probabilities of the first cancer cell line output, and the step of performing the process includes an agent for which learning has been completed by reinforcement learning. is meant to be used. In the step of performing the process, the computing device retrieves mutation information of N (= p _k ) cell lines having information on reactivity by an in vitro experiment using the kth drug among the plurality of drugs. obtaining; generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug; repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent; selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network.

기존 기술에 따르면, 특정 암 환자에게 투여할 최적의 약물을 찾아내기 위하여 정의하는 암세포 생체신호전달 네트워크의 가중치는, 상기 암 환자의 암의 종류 그리고 상기 암 환자에게 구체적으로 발생한 돌연변이의 위치를 함께 고려하여 결정되었다. 따라서 각 암 환자에 대한 암세포 생체신호전달 네트워크가 개별적으로 정의되어야 했다.According to the existing technology, the weight of the cancer cell bio-signaling network, which is defined to find the optimal drug to be administered to a specific cancer patient, considers the type of cancer of the cancer patient and the position of a mutation specifically occurring in the cancer patient. it was decided Therefore, the cancer cell bio-signaling network for each cancer patient had to be individually defined.

그러나 본 발명에 따르면 암의 종류 및 돌연변이의 위치에 관계없이, 다양한 종류의 암 그리고 다양한 환자에 공통적으로 적용될 수 있는 암세포 생체신호전달 네트워크의 링크들에 연관되는 가중치를 결정할 수 있다.However, according to the present invention, it is possible to determine weights associated with links of a cancer cell bio-signaling network that can be commonly applied to various types of cancer and various patients, regardless of the type of cancer and the position of the mutation.

본 발명에 따르면 그 내부구조의 생물학적 의미를 부여할 수 있도록 모델링된 생체신호전달 네트워크의 파라미터를 머신러닝을 통해 최적화할 수 있다. 따라서 머신러닝을 이용하여 암 치료를 위한 최적의 약물을 선정할 수 있을 뿐만 아니라, 머신러닝을 통해 확정된 파라미터들이 제시하는 생물학적 의의를 해석하는 데에 적합한 자료를 생성할 수 있다.According to the present invention, it is possible to optimize the parameters of the modeled biological signal transduction network through machine learning so as to give biological meaning to its internal structure. Therefore, it is possible not only to select the optimal drug for cancer treatment using machine learning, but also to generate data suitable for interpreting the biological significance of the parameters determined through machine learning.

본 발명에 따르면, 노드들 및 링크들로 구성되는 생체신호전달 네트워크에 있어서, 상기 링크들에 할당되는 가중치를 결정하는 역할을 하는 에이전트를 학습시키는 기술을 제공하고자 한다. 또한, 학습이 완료된 상기 에이전트를 이용하여, 새로운 암환자의 치료에 적합한 약물을 선택하는 기술을 제공할 수 있다.According to the present invention, in a bio-signaling network composed of nodes and links, it is intended to provide a technique for learning an agent that plays a role in determining weights assigned to the links. In addition, it is possible to provide a technique for selecting a drug suitable for treatment of a new cancer patient by using the agent for which learning has been completed.

도 1a는 생체신호전달 네트워크의 개념을 설명한 것이다.
도 1b는 본 발명에서 이용하는 개념인 약물 반응성 네트워크의 개념을 설명하기 위한 것이다.
도 1c는 세포의 돌연변이 정보가 반영된 약물 반응성 네트워크를 나타낸다.
도 2a는 본 발명의 일 실시예에 따라 특정 약물 반응성 네트워크로부터 복수 개의 서로 다른 스페시픽 섭동 네트워크들을 정의하여 생성하는 방법을 나타낸 것이다.
도 2b는 도 2a의 스페시픽 섭동 네트워크를 생성하는 방법을 다른 방식으로 설명한 것이다.
도 3은 본 발명의 일 실시예에 따라 특정 약물 반응성 네트워크의 링크들에 할당되는 가중치를 결정하는 방법을 나타낸 것이다.
도 4는 본 발명의 일 실시예에 따른 한 개의 학습스텝의 실행을 위해, 암세포주[p]에 대해 산출한 사멸확률 예측값 y_p와 암세포주[p]에 약물[k]을 투여한 경우 암세포주[p]의 실제 사멸율에 관한 관찰값 z_p를 이용하여 리워드 값을 산출하는 리워드 계산부의 기능을 설명하는 블록도이다
도 5는 산출된 리워드를 이용하여 상기 노미널 네트워크의 링크에 할당되는 가중치들을 갱신하는 과정을 나타낸 블록도이다.
도 6은 본 발명의 일 실시예에 의해 제공되는 1회의 학습스텝에 의해 특정 약물에 관련된 약물 반응성 네트워크의 링크들에 할당되는 가중치를 갱신하는 방법을 나타낸 순서도이다.
도 7은 도 6에 설명한 약물 반응성 네트워크의 가중치 갱신 방법을 이용하여 약물 반응성 네트워크의 가중치를 최적의 값으로 결정하는 방법을 나타낸 것이다.
도 8은 주어진 한 개의 노미널 네트워크로부터 복수 개의 서로 다른 약물 반응성 네트워크들을 확정하는 개념을 설명한 것이다.
도 9는 본 발명의 일 실시예에 따라, 확정된 복수 개의 서로 다른 약물 반응성 네트워크들을 이용하여 환자[x]에 적합한 약물을 찾아내는 과정을 나타낸 것이다.
도 10은 도 9와 같이 준비한 K개의 스페시픽 섭동 네트워크[x][k]를 이용하여 환자[k]를 위한 최적의 약물을 결정하는 과정을 나타낸 것이다.
도 11은 본 발명의 일 실시예에 따라 약물 반응성 네트워크의 가중치를 결정하여 약물 반응성 네트워크를 완성하는 방법을 실행하는 컴퓨팅 장치의 구성을 나타낸 것이다.
도 12는 본 발명의 일 실시예에 따라 특정 암세포주의 사멸에 효과적인 최적 약물을 결정하는 시뮬레이션 방법을 실행하는 컴퓨팅 장치의 구성을 나타낸 것이다.
도 13은 본 발명의 일 실시예에 따라 제공되는 암치료 후보 약물 결정 시스템의 구조를 나타낸 것이다.
도 14는 도 3에 제시한 완성된 에이전트를 학습하는 방법을 설명하는 프레임워크이다.
도 15는 본 발명의 일 실시예에 따라 암세포주들에 특정 약물을 투여하여 사멸비율 Z를 획득하여 제공하는 시스템의 구성을 나타낸 것이다.
도 16a는 본 발명을 이용한 비즈니스 모델의 일 예를 나타낸 것이다.
도 16b는 본 발명을 이용한 비즈니스 모델의 다른 예를 나타낸 것이다.Figure 1a explains the concept of a biological signal transduction network.
1B is for explaining the concept of a drug responsiveness network, which is a concept used in the present invention.
1c shows a drug responsiveness network in which cell mutation information is reflected.
2A illustrates a method of defining and generating a plurality of different specific perturbation networks from a specific drug responsiveness network according to an embodiment of the present invention.
Fig. 2b illustrates in another way how to generate the specific perturbation network of Fig. 2a.
3 illustrates a method of determining weights assigned to links of a specific drug responsiveness network according to an embodiment of the present invention.
Figure 4 shows the predicted value of death probability y _p calculated for cancer cell line [p] and cancer cells when drug [k] is administered to cancer cell line [p] for the execution of one learning step according to an embodiment of the present invention It is a block diagram explaining the function of the reward calculation unit that calculates the reward value using the observed value z _p of the actual mortality rate of week [p].
5 is a block diagram illustrating a process of updating weights assigned to links of the nominal network using calculated rewards.
6 is a flowchart illustrating a method of updating weights assigned to links of a drug responsiveness network related to a specific drug through one learning step provided by an embodiment of the present invention.
FIG. 7 shows a method of determining the weights of the drug responsiveness network to optimal values using the method of updating the weights of the drug responsiveness network described in FIG. 6 .
8 illustrates a concept of determining a plurality of different drug responsiveness networks from a given nominal network.
9 illustrates a process of finding a drug suitable for patient [x] using a plurality of different drug reactivity networks determined according to an embodiment of the present invention.
FIG. 10 shows a process of determining the optimal drug for patient [k] using K specific perturbation networks [x] [k] prepared as shown in FIG. 9 .
11 illustrates a configuration of a computing device executing a method of completing a drug responsiveness network by determining weights of the drug responsiveness network according to an embodiment of the present invention.
12 illustrates a configuration of a computing device executing a simulation method for determining an optimal drug effective for killing a specific cancer cell line according to an embodiment of the present invention.
13 shows the structure of a cancer treatment candidate drug determination system provided according to an embodiment of the present invention.
FIG. 14 is a framework for explaining a method for learning the completed agent presented in FIG. 3 .
15 shows the configuration of a system for obtaining and providing a death rate Z by administering a specific drug to cancer cell lines according to an embodiment of the present invention.
16A shows an example of a business model using the present invention.
16B shows another example of a business model using the present invention.

이하, 본 발명의 실시예를 첨부한 도면을 참고하여 설명한다. 그러나 본 발명은 본 명세서에서 설명하는 실시예에 한정되지 않으며 여러 가지 다른 형태로 구현될 수 있다. 본 명세서에서 사용되는 용어는 실시예의 이해를 돕기 위한 것이며, 본 발명의 범위를 한정하고자 의도된 것이 아니다. 또한, 이하에서 사용되는 단수 형태들은 문구들이 이와 명백히 반대의 의미를 나타내지 않는 한 복수 형태들도 포함한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, the present invention is not limited to the embodiments described herein and may be implemented in various other forms. Terms used in this specification are intended to aid understanding of the embodiments, and are not intended to limit the scope of the present invention. Also, the singular forms used herein include the plural forms unless the phrases clearly dictate the contrary.

도 1a는 생체신호전달 네트워크의 개념을 설명한 것이다.Figure 1a explains the concept of a biological signal transduction network.

본 명세서에서 생체신호전달 네트워크는 생체신호전달 네트워크라고 지칭될 수도 있다.In this specification, the bio-signal transfer network may also be referred to as a bio-signal transfer network.

참조번호 500은 정상세포의 특정 생체신호전달 네트워크의 구조를 개념적으로 제시한 것이다. 참조번호 500이 나타내는 것을 '노미널 네트워크'라고 지칭할 수 있다. Reference number 500 conceptually suggests the structure of a specific biological signal transduction network of normal cells. The reference number 500 may be referred to as a 'nominal network'.

일 실시예에서, 생체신호전달 네트워크는 복수 개의 노드들 및 이들을 연결하는 복수 개의 링크들로 구성될 수 있다. 이때 각각의 노드는 세포 내의 단백질의 활성도를 나타낸다. 각각의 노드는 바이너리 값을 갖거나 또는 실수 값을 갖도록 모델링될 수 있다. 각각의 링크는 해당 링크의 시작점에 있는 제1노드의 활성도가 해당 링크의 종료점(화살표 또는 사각형)에 있는 제2노드의 활성도에 미치는 영향을 나타낸다. 그 종료점을 화살표로 표시한 링크는 제1노드의 활성도가 제2노드의 활성도에 양(positive)의 영향을 주는 것을 나타내며, 그 종료점을 사각형으로 표시한 링크는 제1노드의 활성도가 제2노드의 활성도에 부(negative)의 영향을 주는 것을 나타낸다. 각 링크마다 가중치가 할당되는데, 이 가중치는 상기 양 또는 부의 영향의 강도를 나타낼 수 있다. 상기 생체신호전달 네트워크의 구조는 기존의 생체분자 분야의 연구에 의해 밝혀진 지식을 이용하여 구성되는 것이 수 있다. In one embodiment, the bio-signal transfer network may be composed of a plurality of nodes and a plurality of links connecting them. At this time, each node represents the activity of the protein in the cell. Each node may be modeled to have a binary value or a real value. Each link represents the effect of the activity of the first node at the start point of the link on the activity of the second node at the end point (arrow or square) of the link. A link whose end point is indicated by an arrow indicates that the activity of the first node has a positive effect on the activity of the second node, and a link whose end point is indicated by a square indicates that the activity of the first node indicates that it has a negative effect on the activity of A weight is assigned to each link, which may represent the strength of the positive or negative influence. The structure of the bio-signaling network may be configured using knowledge revealed by research in the field of existing biomolecules.

상기 모델링의 방식은 복수 개의 방식들 중 선택된 것일 수 있다. 서로 다른 모델링 방식에 따라 상기 링크들의 타입의 개수 및 표현 방식이 조금씩 다를 수 있다. The modeling method may be selected from among a plurality of methods. Depending on different modeling methods, the number of types of links and expression methods may be slightly different.

본 명세서에서, 노미널 네트워크의 중 특정 노드에 변이가 존재하는 경우, 이 변이가 존재하는 네트워크를 스페시픽 네트워크라고 지칭할 수 있다. 즉, 스페시픽 네트워크는 세포의 돌연변이 정보가 반영된 노미널 네트워크를 의미할 수 있다.In this specification, when a mutation exists in a specific node of the nominal network, the network in which the mutation exists may be referred to as a specific network. That is, the specific network may mean a nominal network in which cell mutation information is reflected.

참조번호 510은 제1암세포주, 즉 암세포주[1]에 존재하는 변이에 대응하는 노드가 상기 노미널 네트워크에 존재하는 경우, 이를 표현한 '제1스페시픽 네트워크', 즉 '스페시픽 네트워크[1]'를 나타낸다. 변이가 존재하는 노드는 검은색으로 표시되어 있다. Reference number 510 denotes a 'first specific network' representing a node corresponding to a mutation existing in the first cancer cell line, that is, cancer cell line [1], in the nominal network, that is, a 'specific network'. [1]'. Nodes with mutations are marked in black.

참조번호 520은 제2암세포주, 즉 암세포주[2]에 존재하는 변이에 대응하는 노드가 상기 노미널 네트워크에 존재하는 경우, 이를 표현한 '제2스페시픽 네트워크', 즉 '스페시픽 네트워크[2]'를 나타낸다. 변이가 존재하는 노드는 검은색으로 표시되어 있다. Reference number 520 denotes a 'second specific network' representing a node corresponding to a mutation present in a second cancer cell line, that is, cancer cell line [2], in the nominal network, that is, a 'specific network'. [2]'. Nodes with mutations are marked in black.

상기 암세포주[k]는 암세포[k]라는 개념 및 용어로 대체되어 설명될 수도 있다.The cancer cell line [k] may be replaced with the concept and term of cancer cell [k].

이와 같이, 암세포주[k]에 존재하는 변이에 대응하는 노드가 상기 노미널 네트워크에 존재하는 경우, 이를 '스페시픽 네트워크[k]'라고 지칭할 수 있다. In this way, when a node corresponding to a mutation present in cancer cell line [k] exists in the nominal network, it may be referred to as a 'specific network [k]'.

참조번호 521은 상기 암세포주[2]에 특정 약물을 투여한 경우에, 상기 특정 약물에 의해 그 발현량이 영향을 받는 타겟 노드가 상기 스페시픽 네트워크[2]에 존재하는 경우, 이를 표현한 '스페시픽 섭동 네트워크[2]'를 나타낸다. 변이가 존재하는 노드는 검은색으로 표시되어 있으며, 상기 타겟 노드 2개는 회색으로 표시되어 있다.Reference number 521 indicates that when a specific drug is administered to the cancer cell line [2], a target node whose expression level is affected by the specific drug exists in the specific network [2], expressing this. It represents the cipic perturbation network [2]'. Nodes with mutations are shown in black, and the two target nodes are shown in gray.

도 1b는 본 발명에서 이용하는 개념인 약물 반응성 네트워크의 개념을 설명하기 위한 것이다.1B is for explaining the concept of a drug responsiveness network, which is a concept used in the present invention.

노미널 네트워크(500)로부터 복수 개의 서로 다른 약물 반응성 네트워크들이 정의될 수 있다. 상기 각각의 약물 반응성 네트워크는 노미널 네트워크(500)의 구종 중 일부로 이루어진 서브 네트워크로 간주될 수 있다.A plurality of different drug responsiveness networks can be defined from the nominal network 500 . Each of the drug responsiveness networks may be regarded as a subnetwork composed of some of the spheres of the nominal network 500 .

도 1b에는 노드번호 3, 5, 6, 및 7의 노드들로 구성된 제1약물 반응성 네트워크(500[1]) 및 노드번호 1, 2, 및 3의 노드들로 구성된 제2약물 반응성 네트워크(500[2])가 제시되어 있다. 1B shows a first drug responsiveness network 500[1] composed of nodes of node numbers 3, 5, 6, and 7 and a second drug responsiveness network 500 composed of nodes of node numbers 1, 2, and 3. [2]) is presented.

도 1b에서는 노미널 네트워크(500)로부터 정의된 2개의 약물 반응성 네트워크들만을 제시하였으나, 더 많은 약물 반응성 네트워크들이 정의될 수 있음은 쉽게 이해될 수 있다. 예컨대 도 1b에 제시되지 않은 제k약물 반응성 네트워크(500[k])가 더 정의될 수도 있다.Although FIG. 1B shows only two drug responsiveness networks defined from the nominal network 500, it can be easily understood that more drug responsiveness networks can be defined. For example, a kth drug reactivity network 500[k] not shown in FIG. 1B may be further defined.

상기 제k약물 반응성 네트워크(500[k])에서, 각 시각에서의 각 노드의 상태값을 결정하는 상태천이 방정식들은 이미 정의되어 있을 수 있다. 이러한 상태천이 방정식에 관한 기술은 예컨대 대한민국 특허등록번호 KR 10-2029297 및 KR 10-1975424에 예시되어 있다.In the kth drug responsiveness network 500[k], state transition equations for determining the state value of each node at each time point may have already been defined. Techniques for such state transition equations are exemplified in, for example, Korean Patent Registration Nos. KR 10-2029297 and KR 10-1975424.

이때, 상태천이 방정식들에 포함된 계수들 중 적어도 일부는 제k약물 반응성 네트워크(500[k])의 각 링크에 할당된 가중치에 의해 결정될 수 있다. 상기 가중치들은 최적의 값으로 선택되어야 한다. 상기 제k약물 반응성 네트워크(500[k])의 각 링크에 할당되어야 하는 최적의 가중치의 값을 결정하는 본 발명에서 해결해야 하는 중요한 문제점이며, 이하 기술하는 본 발명의 구체적인 실시예에 의해 그 해결수단이 제공될 수 있다. In this case, at least some of the coefficients included in the state transition equations may be determined by a weight assigned to each link of the kth drug responsiveness network 500[k]. The weights should be selected to optimal values. This is an important problem to be solved in the present invention, which determines the value of the optimal weight to be assigned to each link of the kth drug responsiveness network (500[k]), and is solved by a specific embodiment of the present invention described below. Means may be provided.

서로 다른 약물 반응성 네트워크들은 상기 노미널 네트워크의 서로 다른 하위 구조를 갖는 서브 네트워크들이다. 따라서 2개의 서로 다른 약물 반응성 네트워크들에 공통으로 존재하는 링크가 있더라도, 상기 링크에 할당된 가중치는 상기 2개의 약물 반응성 네트워크들에 따라 서로 다른 값을 가질 수 있다. Different drug responsive networks are sub-networks with different sub-structures of the nominal network. Therefore, even if there is a link common to two different drug responsiveness networks, the weight assigned to the link may have different values depending on the two drug responsiveness networks.

본 발명의 일 실시예에서, 복수 개의 약물 반응성 네트워크들 중 각각의 약물 반응성 네트워크에 존재하는 링크들에 할당되는 가중치를 결정하는 과정은, 상기 각각의 약물 반응성 네트워크 마다 독립적으로 수행될 수 있다.In one embodiment of the present invention, the process of determining weights assigned to links existing in each drug responsiveness network among a plurality of drug responsiveness networks may be independently performed for each drug responsiveness network.

도 1c는 세포의 돌연변이 정보가 반영된 약물 반응성 네트워크를 나타낸다.1c shows a drug responsiveness network in which cell mutation information is reflected.

상기 노미널 네트워크(500) 중 노드 7에 변이가 존재하는 경우, 도 1b에 제시된 제1약물 반응성 네트워크(500[1])에 상기 변이 정보가 반영(apply)된 약물 반응성 네트워크[7][1]을 정의할 수 있다. When a mutation exists in node 7 of the nominal network 500, the mutation information is applied to the first drug responsiveness network 500[1] shown in FIG. 1B [7][1] ] can be defined.

또는, 위와 같은 방식으로,상기 노미널 네트워크(500) 중 노드 6에 변이가 존재하는 경우, 도 1b에 제시된 제1약물 반응성 네트워크(500[1])에 상기 변이 정보가 반영(apply)된 약물 반응성 네트워크[6][1]을 정의할 수 있다. Alternatively, in the same manner as above, when a mutation exists in node 6 of the nominal network 500, a drug to which the mutation information is applied to the first drug reactivity network 500 [1] shown in FIG. 1B. Reactive networks [6][1] can be defined.

이와 같이 세포의 돌연변이 정보가 반영된 약물 반응성 네트워크를 스페시픽 섭동 네트워크라고 지칭할 수 있다. In this way, a drug responsiveness network in which cell mutation information is reflected may be referred to as a specific perturbation network.

도 1c에는 제1약물 반응성 네트워크(500[1])로부터 정의된 2개의 스페시픽 섭동 네트워크들을 예시하였으나, 다른 돌연변이 정보들이 이용하여 더 많은 개수의 스페시픽 섭동 네트워크들을 정의할 수 있다는 점은 쉽게 이해될 수 있다. Although FIG. 1C illustrates two specific perturbation networks defined from the first drug responsiveness network 500[1], the fact that a larger number of specific perturbation networks can be defined using other mutation information is can be easily understood.

상술한 도 1a, 도1b, 및 도 1c는 통칭하여 도 1이라고 지칭할 수 있다.The above-described FIGS. 1A, 1B, and 1C may collectively be referred to as FIG. 1 .

<특정 약물 반응성 네트워크의 가중치의 확정 과정><Determination process of weights of specific drug responsiveness network>

도 2a는 본 발명의 일 실시예에 따라 특정 약물 반응성 네트워크로부터 복수 개의 서로 다른 스페시픽 섭동 네트워크들을 정의하여 생성하는 방법을 나타낸 것이다.2A illustrates a method of defining and generating a plurality of different specific perturbation networks from a specific drug responsiveness network according to an embodiment of the present invention.

도 2a의 왼쪽에는 약물[k]에 대한 약물 반응성 네트워크인 제k약물 반응성 네트워크(500[k])가 제시되어 있다. 상기 제k약물 반응성 네트워크(500[k])에 p_k개의 서로 다른 셀라인의 변이 정보를 반영하면 p_k개의 서로 다른 스페시픽 섭동 네트워크들이 정의될 수 있다. 예컨대, 미리 준비된 총 p_k개의 서로 다른 셀라인의 변이 정보 중 p번째 변이 정보를 상기 제k약물 반응성 네트워크(500[k])에 반영하면 스페시픽 섭동 네트워크[p][k]가 생성될 수 있다.On the left side of FIG. 2A, the kth drug responsiveness network 500[k], which is a drug responsiveness network for drug [k], is presented. If mutation information of p _k different cell lines is reflected in the kth drug responsiveness network 500[k], p _k different specific perturbation networks can be defined. For example, if the p-th mutation information among the mutation information of a total of p _k different cell lines prepared in advance is reflected in the k-th drug reactivity network 500 [k], a specific perturbation network [p] [k] is generated. can

스페시픽 섭동 네트워크[p][k]는 암세포주[p]에 약물[k]를 투여했을 때에 암세포주[p]의 사멸확률 예측값 y[p][k]를 출력할 수 있다.The specific perturbation network [p][k] can output the predicted value y[p][k] of the death probability of the cancer cell line [p] when the drug [k] is administered to the cancer cell line [p].

상기 p_k개의 서로 다른 셀라인은, 모집합인 P개의 셀라인들 중 선택된 것일 수 있다(P>p_k). 그리고 상기 p_k개의 서로 다른 셀라인의 변이정보는, 상기 P개의 셀라인들의 변이정보로부터 선택된 것일 수 있다.The p _k different cell lines may be selected from among the P cell lines of the population (P>p _k ). The mutation information of the p _k different cell lines may be selected from the mutation information of the P cell lines.

이때, 상기 P개의 셀라인들 모두에 대하여 상기 약물[k]에 대한 반응성에 관한 정보가 존재하지는 않을 수 있다. 예컨대, 상기 P개의 셀라인들 중 상기 일부의 셀라인들에 대해서는 상기 약물[k]을 투여한 실험을 하였지만, 다른 나머지 셀라인들에 대해서는 상기 약물[k]을 투여한 실험을 하지 않았을 수 있다. 즉, 상기 약물[k]에 대한 반응성에 관한 정보는 상기 P개의 셀라인들 중 일부의 셀라인들에 대해서만 존재할 수 있다. At this time, information on reactivity to the drug [k] may not exist for all of the P cell lines. For example, an experiment in which the drug [k] was administered to some of the P cell lines was performed, but an experiment in which the drug [k] was administered to the other cell lines may not be performed. . That is, the information on the reactivity to the drug [k] may exist only for some of the P cell lines.

제k약물 반응성 네트워크(500[k])로부터 스페시픽 섭동 네트워크들을 생성하기 위해 이용되는 상기 p_k개의 서로 다른 셀라인들은, 상기 P개의 셀라인 중 약물[k]에 대한 반응성에 관한 정보가 존재하는 일부의 셀라인들로 구성될 수 있다. Among the p _k different cell lines used to generate specific perturbation networks from the k th drug reactivity network 500 [k], information on reactivity to drug [k] among the P cell lines is It may be composed of some existing cell lines.

서로 다른 약물 반응성 네트워크들로부터 각각 얻을 수 있는 스페시픽 섭동 네트워크들의 개수는 서로 다를 수 있다. 예컨대, 약물[1]에 대한 제1약물 반응성 네트워크로부터 얻을 수 있는 스페시픽 섭동 네트워크들의 개수가 p₁개이고, 약물[2]에 대한 제2약물 반응성 네트워크로부터 얻을 수 있는 스페시픽 섭동 네트워크들의 개수가 p₂개라면, p₁은 p₂와 다를 수 있다.The number of specific perturbation networks each obtained from different drug responsiveness networks may be different. For example, the number of specific perturbation networks obtained from the first drug responsiveness network for drug [1] is p ₁ , and the number of specific perturbed networks obtained from the second drug responsiveness network for drug [2] is If the number is p ₂ , p ₁ may be different from p ₂ .

상기 스페시픽 섭동 네트워크[p][k]에서, 각 노드의 시간에 따른 상태값을 결정하는 상태천이 방정식들은 이미 정의되어 있을 수 있다. In the specific perturbation network [p][k], state transition equations for determining the state value of each node over time may be already defined.

예컨대, 상기 스페시픽 섭동 네트워크[p][k]에 대한 상기 상태천이 방정식들은 기본적으로 상기 제k약물 반응성 네트워크(500[k])의 상태천이 방정식들과 동일할 수 있다. 다만, 예컨대, 상기 스페시픽 섭동 네트워크[p][k]에 존재하는 변이의 위치에 대응하는 노드의 상태를 결정하는 한 개 또는 복수 개의 상태천이 방정식만 수정된 것일 수 있다. For example, the state transition equations for the specific perturbation network [p][k] may be basically the same as the state transition equations of the kth drug responsiveness network 500[k]. However, for example, only one or a plurality of state transition equations for determining the states of nodes corresponding to positions of transitions existing in the specific perturbation network [p][k] may be modified.

도 2b는 도 2a의 스페시픽 섭동 네트워크를 생성하는 방법을 다른 방식으로 설명한 것이다.Fig. 2b illustrates in another way how to generate the specific perturbation network of Fig. 2a.

노미널 네트워크(500)에 암세포주[p]의 변이발생 노드에 관한 정보(MN[p])을 적용하여 스페시픽 네트워크[p]를 생성할 수 있다. A specific network [p] may be generated by applying the information (MN[p]) on mutation-producing nodes of the cancer cell line [p] to the nominal network 500 .

상기 생성된 스페시픽 네트워크[p]에 약물[k]의 섭동 타겟 노드에 관한 정보(PT[k])를 적용하여 스페시픽 섭동 네트워크[p][k]를 생성할 수 있다. The specific perturbation network [p][k] may be generated by applying the information (PT[k]) on the perturbation target node of the drug [k] to the generated specific network [p].

이때, 스페시픽 섭동 네트워크[p][k]는, 암세포주[p]에 약물[k]를 투여했을 때에 암세포주[p]의 사멸확률 예측값 y[p][k]를 출력할 수 있다.At this time, when the drug [k] is administered to the cancer cell line [p], the specific perturbation network [p][k] can output the predicted value y [p] [k] of the death probability of the cancer cell line [p]. .

상술한 도 2a, 및 도 2b는 통칭하여 도 2라고 지칭할 수 있다.The above-described FIGS. 2A and 2B may collectively be referred to as FIG. 2 .

도 3은 본 발명의 일 실시예에 따라 특정 약물 반응성 네트워크의 링크들에 할당되는 가중치를 결정하는 방법을 나타낸 것이다. 3 illustrates a method of determining weights assigned to links of a specific drug responsiveness network according to an embodiment of the present invention.

도 3는 약물[k]와 관련된 제k약물 반응성 네트워크(500[k])의 링크들의 가중치를 결정하는 프레임워크가 제시되어 있다. 상기 프레임워크는 도 2에 설명한 제k약물 반응성 네트워크(500[k])로부터 생성된 한 세트의 스페시픽 섭동 네트워크[p][k]가 이용될 수 있다(p=1, 2, 3, ... p_k). 그리고 상기 프레임워크는 리워드 계산부(30) 및 에이전트(20)를 함께 이용할 수 있다. 3 shows a framework for determining the weights of links of the k-th drug responsiveness network 500 [k] related to drug [k]. As the framework, a set of specific perturbation networks [p][k] generated from the kth drug reactivity network 500[k] described in FIG. 2 may be used (p = 1, 2, 3, ... p _k ). In addition, the framework may use the reward calculation unit 30 and the agent 20 together.

본 명세서에서 에이전트(20)는 가중치 결정 에이전트라고 지칭될 수도 있다.In this specification, agent 20 may also be referred to as a weight determination agent.

에이전트(20)는 신경망을 포함하는 네트워크를 포함하는 정보처리모듈일 수 있다. 상기 에이전트(20)는 머신러닝 네트워크, 뉴럴 네트워크와 같은 학습 가능한 네트워크를 포함하는 것으로서, 복수 개의 레이어들을 포함하는 것일 수 있다. 상기 신경망은 강화학습에 의해 훈련될 수 있는 것이다. 도 3에서 이용하는 상기 (20)는 이미 학습이 완료된 것일 수 있다. 상기 (20)를 학습시키는 구체적인 방법은 본 명세서에서 후술한다. The agent 20 may be an information processing module including a network including a neural network. The agent 20 includes a learnable network such as a machine learning network and a neural network, and may include a plurality of layers. The neural network can be trained by reinforcement learning. The above (20) used in FIG. 3 may have already been learned. A specific method of learning the above (20) will be described later in this specification.

도 3에 제시한 한 세트의 스페시픽 섭동 네트워크[p][k]들로부터 한 세트의 암세포주[p]의 사멸확률 예측값 y[p][k]들이 출력될 수 있다(p=1, 2, 3, ... p_k). From a set of specific perturbation networks [p][k] presented in FIG. 3, prediction values y[p][k] of a set of cancer cell lines [p] can be output (p=1, 2, 3, ... p _k ).

이하, 약물[k]에 대하여 출력된 상기 y[p][k]를 간단히 y_p라고 표기할 수 있고, 그리고 인덱스 p_k를 인덱스 N로 대체하여 표기할 수 있다. Hereinafter, the y[p][k] output for drug [k] can be simply expressed as y _p , and the index p _k can be expressed by replacing the index N.

이제 상기 p_k개의 사멸확률 예측값 y_p, 즉 N개의 사멸확률 예측값 y_p를 이용하여 예측값 벡터 Y={y₁, y₂, y₃, ..., y_N}를 생성할 수 있다.Now, a prediction vector Y={y ₁ , y ₂ , y ₃ , ..., y _N } may be generated using the p _k predicted death probability values y _p , that is, the N death probability predicted values y _p .

그리고 암세포주[p]에 약물[k]를 투여했을 때에 암세포주[p]의 실제 사멸률에 관한 관찰을 in vitro 실험을 통해 수행한 결과를 준비할 수 있다. 상기 in vitro 실험을 통해 수행한 결과는 이미 존재하는 public data로부터 얻은 것일 수 있다. 따라서 암세포주[p]에 약물[k] 투여 시 암세포주[p]의 실제 사멸률에 관한 관찰값 z_p 역시 p=1 내지 p=N (N=p_k)까지의 N개가 준비되어 있을 수 있다. N개의 관찰값 z_p를 이용하여 관찰값 벡터 Z={z₁, z₂, z₃, ..., z_N}를 생성할 수 있다.In addition, when the drug [k] is administered to the cancer cell line [p], the observation on the actual death rate of the cancer cell line [p] can be prepared through an in vitro experiment. The results of the in vitro experiment may be obtained from existing public data. Therefore, when drug [k] is administered to cancer cell line [p], N observation values z _p regarding the actual death rate of cancer cell line [p] may also be prepared from p=1 to p=N (N=p _k ). there is. An observation value vector Z={z ₁ , z ₂ , z ₃ , ..., z _N } can be created using N observation values z _p .

도 4는 본 발명의 일 실시예에 따른 한 개의 학습스텝의 실행을 위해, 암세포주[p]에 대해 산출한 사멸확률 예측값 y_p와 암세포주[p]에 약물[k]을 투여한 경우 암세포주[p]의 실제 사멸율에 관한 관찰값 z_p를 이용하여 리워드 값을 산출하는 리워드 계산부의 기능을 설명하는 블록도이다 (p=1, 2, 3. ..., N).Figure 4 shows the predicted value of death probability y _p calculated for cancer cell line [p] and cancer cells when drug [k] is administered to cancer cell line [p] for the execution of one learning step according to an embodiment of the present invention It is a block diagram explaining the function of the reward calculator that calculates the reward value using the observed value z _p of the actual mortality rate of week [p] (p=1, 2, 3. ..., N).

리워드 계산부(30)는 p=1 내지 p=N까지의 모든 값에 대한 상기 예측값 y_p와 관찰값 z_p를 모두 입력받으면 그때 비로소 상기 리워드 값을 산출할 수 있다.The reward calculator 30 may calculate the reward value only when both the predicted value y _p and the observed value z _p for all values from p=1 to p=N are input.

본 명세서에서 상기 '예측값'은 '시뮬레이션 예측값'으로 지칭하고, 상기 '관찰값'은 'in vitro 관찰값'으로 지칭할 수도 있다. In the present specification, the 'predicted value' may be referred to as a 'simulation predicted value', and the 'observed value' may be referred to as an 'in vitro observed value'.

오차 계산부(31)는, p=1 내지 p=N까지의 모든 값에 대한 상기 예측값 y_p 들로 구성되는 예측값 벡터 Y와, p=1 내지 p=N까지의 모든 값에 대한 상기 관찰값 z_p 들로 구성되는 관찰값 벡터 Z 간의 거리를 구하여, 상기 스페시픽 섭동 네트워크의 예측 에러 Err(i)로 간주할 수 있다. 여기서 i는 i번째로 수행되는 상기 학습스텝(학습 아이터레이션)을 나타내는 인덱스이다. 그리고 오차 계산부(31)는 상기 예측 에러 Err(i)에 반비례하는 제1값 h/Err(i)를 출력할 수 있다.The error calculation unit 31 generates a predicted value vector Y consisting of the predicted values y _p for all values from p = 1 to p = N, and the observed values for all values from p = 1 to p = N The distance between the observation value vectors Z composed of z _p can be obtained and regarded as the prediction error Err(i) of the specific perturbation network. Here, i is an index representing the i-th learning step (learning iteration). Also, the error calculator 31 may output a first value h/Err(i) that is inversely proportional to the prediction error Err(i).

상기 제1값 h/Err(i)은 과거 오차 저장부(32)에 저장되어 나중에 사용될 수 있다. 즉, 예컨대 상기 과거 오차 저장부(32)에 저장된 상기 제1값 h/Err(i)은, 상기 저장이 이루어진 이후에 실행되는 i+1번째 학습스텝(학습 아이터레이션)와 관련하여 이용될 수 있다. The first value h/Err(i) is stored in the past error storage unit 32 and can be used later. That is, for example, the first value h/Err(i) stored in the past error storage unit 32 may be used in relation to the i+1 th learning step (learning iteration) executed after the storage is made. can

마찬가지로, 상기 과거 오차 저장부(32)에는 i-1번째로 수행되었던 상기 학습스텝에서 구한 상기 예측 에러 Err(i-1)에 반비례하는 제2값 h/Err(i-1)이 이미 저장되어 있을 수 있다. Similarly, the second value h/Err(i-1), which is inversely proportional to the prediction error Err(i-1) obtained in the i-1th learning step, is already stored in the past error storage unit 32, There may be.

리워드 산출부(33)는 상기 제1값 h/Err(i)과 제2값 h/Err(i-1) 간의 차이값을 기초로 상기 리워드 값을 산출할 수 있다.The reward calculator 33 may calculate the reward value based on a difference between the first value h/Err(i) and the second value h/Err(i−1).

상기 리워드 값을 산출하는 구체적인 방법은 다음과 같다.A specific method of calculating the reward value is as follows.

i+1번째 학습스텝에서 에이전트(20)에 입력으로 들어가는 리워드는 다음과 같이 계산될 수 있다. The reward input to the agent 20 at the i+1th learning step can be calculated as follows.

우선, 0번째 학습스텝부터 i-1번째 학습스텝까지 얻은 에러들 Err(0) ~ Err(i-1)을 저장해 둔다. First of all, Err(0) ~ Err(i-1) of the errors obtained from the 0th learning step to the i-1th learning step are stored.

이때, 0번째 학습스텝부터 i-1번째 학습스텝까지 얻은 상기 에러들에 반비례하는 값들 h/Err(0), h/Err(1), h/Err(2), ..., h/Err(i-1) 중 최대값을 선정할 수 있다. At this time, values in inverse proportion to the errors obtained from the 0th learning step to the i-1th learning step h/Err(0), h/Err(1), h/Err(2), ..., h/Err Among (i-1), the maximum value can be selected.

상기 최대값을 h/Err(j)라고 가정하면, 다음 수식 1과 같이 d(i) 값을 산출할 수 있다(j= 0, 1, 2, ..., 또는 i-1) Assuming that the maximum value is h/Err(j), the d(i) value can be calculated as in Equation 1 below (j = 0, 1, 2, ..., or i-1)

[수식 1] d(i) = h/Err(i) - h/Err(j),[Equation 1] d(i) = h/Err(i) - h/Err(j),

여기서 Err(i)는 i번째 학습스텝에서 얻은 에러값이다. Here, Err(i) is the error value obtained in the ith learning step.

여기서 상기 d(i)가 음수이면 상기 리워드는 0(zero)로 결정되고, 양수이면 상기 리워드는 상기 d(i)로 결정될 수 있다. Here, if d(i) is a negative number, the reward may be determined as 0 (zero), and if it is a positive number, the reward may be determined as d(i).

도 5는 산출된 리워드를 이용하여 상기 노미널 네트워크의 링크에 할당되는 가중치들을 갱신하는 과정을 나타낸 블록도이다. 5 is a block diagram illustrating a process of updating weights assigned to links of the nominal network using calculated rewards.

한 번의 학습스텝 과정 동안, 리워드 계산부(30)는 리워드 값을 한 번 출력한다. 상기 출력된 리워드 값은 에이전트(20)에 입력된다. 에이전트(20)는 상기 리워드 값을 기초로 액션을 출력한다. 상기 액션은 다음 학습스텝에서, 상기 제k 약물 반응성 네트워크(500[k])의 링크들에 할당되는 가중치들의 집합을 의미한다. 상기 출력된 액션을 적용하여 상기 제k 약물 반응성 네트워크(500[k])의 각 링크들에 할당되는 가중치들을 갱신할 수 있다.During one learning step process, the reward calculation unit 30 outputs the reward value once. The outputted reward value is input to the agent 20. The agent 20 outputs an action based on the reward value. The action means a set of weights assigned to links of the kth drug responsiveness network 500[k] in the next learning step. Weights assigned to each link of the kth drug responsiveness network 500[k] may be updated by applying the output action.

도 6은 본 발명의 일 실시예에 의해 제공되는 1회의 학습스텝에 의해 특정 약물에 관련된 약물 반응성 네트워크의 링크들에 할당되는 가중치를 갱신하는 방법을 나타낸 순서도이다. 6 is a flowchart illustrating a method of updating weights assigned to links of a drug responsiveness network related to a specific drug through one learning step provided by an embodiment of the present invention.

도 6에 나타낸 순서도는 도 2 내지 도 5를 함께 참조하여 설명할 수 있다. The flowchart shown in FIG. 6 can be described with reference to FIGS. 2 to 5 together.

도 6의 순서도에 의한 방법은 처리부 및 저장부를 갖는 컴퓨팅 장치에 의해 실행될 수 있다. 이 방법은 상기 컴퓨팅 장치가, 소정의 학습스텝을 실행하는 단계를 포함할 수 있다. The method according to the flowchart of FIG. 6 may be executed by a computing device having a processing unit and a storage unit. The method may include executing, by the computing device, a predetermined learning step.

본 명세서에서 상기 학습스텝은 학습 아이터레이션이라고 지칭될 수도 있다.In this specification, the learning step may be referred to as learning iteration.

1회의 상기 학습스텝에 의해 상기 제k 약물 반응성 네트워크(500[k])의 가중치가 1회 갱신될 수 있다. The weight of the kth drug responsiveness network 500[k] may be updated once by the learning step once.

이때, 상기 학습스텝은, 아래의 단계(S10), 단계(S20), 단계(S30), 단계(S40), 단계(S50), 및 단계(S60)를 포함할 수 있다.At this time, the learning step may include the following steps (S10), (S20), (S30), (S40), (S50), and (S60).

단계(S10), 단계(S20), 단계(S30), 단계(S40), 단계(S50), 및 단계(S60)는 i번째 학습스텝에서 이루어질 수 있으며, 서로 다른 학습스텝마다 반복하여 실행될 수 있다.Steps S10, S20, S30, S40, S50, and S60 may be performed in the i-th learning step, and may be repeatedly executed for each different learning step. .

단계(S10)에서, 상기 컴퓨팅 장치는, 약물[k]에 대하여 준비된 제k 약물 반응성 네트워크(500[k])에 N개의 암세포주의 변이정보를 적용하여, N개의 스페시픽 섭동 네트워크[p][k]를 생성할 수 있다(p=1, 2, 3, ... N(=p_k)). In step S10, the computing device applies the mutation information of N cancer cell lines to the kth drug reactivity network 500[k] prepared for drug [k], so that N specific perturbation networks [p] [k] can be generated (p=1, 2, 3, ... N(=p _k )).

단계(S20)에서, 상기 컴퓨팅 장치는, N개의 스페시픽 섭동 네트워크들[p][k]이 출력한 N개의 사멸확률들로 구성된 벡터 Y={y₁, y₂, y₃, ..., y_N}와, 약물[k]를 투여하는 in vitro 실험에 의해 관찰된 N개의 암세포주의 사멸비율에 관한 값들로 구성된 벡터 Z={z₁, z₂, z₃, ..., z_N}를 준비할 수 있다. In step S20, the computing device generates a vector Y={y ₁ , y ₂ , y ₃ , .. ., y _N } and a vector Z={z ₁ , z ₂ , z ₃ , ..., z consisting of values related to the death rate of N cancer cell lines observed by in vitro experiments in which drug [k] is administered. _N } can be prepared.

단계(S30)에서, 상기 컴퓨팅 장치는, 벡터 Y와 벡터 Z 간의 거리(dist{Y, Z})에 반비례하는 제1값(h/Err(i))을 산출할 수 있다. In step S30, the computing device may calculate a first value (h/Err(i)) that is inversely proportional to the distance (dist{Y, Z}) between vector Y and Z.

단계(S40)에서, 상기 컴퓨팅 장치는, 상기 제1값과 소정의 제2값 간의 차이값을 기초로 리워드 값을 산출할 수 있다. In step S40, the computing device may calculate a reward value based on a difference between the first value and a predetermined second value.

이때, 상기 제2값은, 상기 i번째 학습스텝의 직전에 이루어진 i-1번째 학습스텝에서 준비했던 상기 벡터 Y와 상기 벡터 Z 간의 거리에 반비례하는 값(h/Err(i-1))일 수 있다. At this time, the second value is a value (h/Err(i-1)) inversely proportional to the distance between the vector Y and the vector Z prepared in the i-1 th learning step immediately before the i th learning step. can

단계(S50)에서, 상기 컴퓨팅 장치는, 에이전트(20)에 상기 리워드 값을 입력하여, 상기 에이전트(20)가 제k 약물 반응성 네트워크(500[k])의 링크들에 대한 새로운 가중치를 산출할 수 있다. In step S50, the computing device inputs the reward value to the agent 20 so that the agent 20 calculates new weights for the links of the kth drug responsiveness network 500[k]. can

단계(S60)에서, 상기 컴퓨팅 장치는, 상기 산출된 새로운 가중치로 상기 제k 약물 반응성 네트워크(500[k])를 갱신할 수 있다. In step S60, the computing device may update the kth drug responsiveness network 500[k] with the calculated new weights.

상기 컴퓨팅 장치는, 주어진 제k 약물 반응성 네트워크(500[k])에 대하여 상기 학습스텝을 반복적으로 실행하도록 되어 있을 수 있다. The computing device may be configured to repeatedly execute the learning step for a given kth drug responsiveness network 500[k].

상기 학습스텝이 반복될 때마다 상기 제k 약물 반응성 네트워크(500[k])의 링크들의 가중치들이 한 번씩 갱신될 수 있다. 즉, 상기 학습 아이터레이션이 1회 수행될 때마다 상기 제k 약물 반응성 네트워크(500[k])의 링크들의 가중치들이 1회 갱신될 수 있다. Each time the learning step is repeated, the weights of the links of the kth drug responsiveness network 500[k] may be updated once. That is, whenever the learning iteration is performed once, the weights of the links of the kth drug responsiveness network 500[k] may be updated once.

도 7은 도 6에 설명한 약물 반응성 네트워크의 가중치 갱신 방법을 이용하여 약물 반응성 네트워크의 가중치를 최적의 값으로 결정하는 방법을 나타낸 것이다.FIG. 7 shows a method of determining the weights of the drug responsiveness network to optimal values using the method of updating the weights of the drug responsiveness network described in FIG. 6 .

상기 에이전트(20)가 1회의 입력을 받고 이에 대한 출력을 하는 과정을 1회의 학습스텝이라고 지칭할 수 있다.A process in which the agent 20 receives an input once and outputs the input may be referred to as a single learning step.

주어진 한 개의 약물인 약물[k]에 대하여 정의된 제k 약물 반응성 네트워크(500[k])에 대하여, 도 6에 설명한 학습스텝을 U회 반복하여 실행할 수 있다. 이때 u번째 학습스텝[u]의 실행과정에서 리워드 계산부(30)는 리워드[u]를 출력할 수 있고, 에이전트(20)는 가중치[u]를 출력할 수 있다. For the k-th drug reactivity network 500[k] defined for a given drug [k], the learning step described in FIG. 6 may be repeated and executed U times. At this time, in the process of executing the u-th learning step [u], the reward calculator 30 may output a reward [u], and the agent 20 may output a weight [u].

즉, 학습 스텝을 U회 반복함으로써, 총 U개의 리워드들이 생성될 수 있다. 이때, 상기 총 U개의 리워드들 중에서 가장 좋은 리워드 값을 선택할 수 있다. 만일 리워드 값이 큰 값일수록 좋은 것이라면, 가장 큰 리워드 값을 선택할 수 있다. 이렇게 선택된 리워드 값이 최적의 리워드 값이다.That is, by repeating the learning step U times, a total of U rewards can be generated. At this time, the best reward value may be selected from among the total number of U rewards. If a larger reward value is better, the largest reward value can be selected. The reward value selected in this way is an optimal reward value.

이때, 상기 학습스텝이 반복됨에 따라 반드시 상기 리워드 값이 더 좋은 값으로 변화하는 것은 아닐 수 있다. 즉, 상기 학습스텝이 반복됨에 따라 상기 리워드 값은 증가하다가 다시 감소할 수도 있고, 또는 감소하다가 다시 증가될 수도 있다. At this time, as the learning step is repeated, the reward value may not necessarily change to a better value. That is, as the learning step is repeated, the reward value may increase and then decrease again, or may decrease and then increase again.

그 다음, 상기 최적의 리워드 값이 생성된 학습스텝에서 산출된 가중치를 최적의 가중치로 결정할 수 있다. Then, the weight calculated in the learning step in which the optimal reward value is generated may be determined as the optimal weight.

상기 결정된 최적의 가중치를 상기 제k 약물 반응성 네트워크(500[k])의 링크들의 가중치인 것으로 최종적으로 결정할 수 있다. The determined optimal weight may be finally determined as a weight of links of the kth drug responsiveness network 500[k].

<K개의 서로 다른 약물 반응성 네트워크의 확정><Confirmation of K different drug response networks>

도 8은 주어진 한 개의 노미널 네트워크로부터 복수 개의 서로 다른 약물 반응성 네트워크들을 확정하는 개념을 설명한 것이다.8 illustrates a concept of determining a plurality of different drug responsiveness networks from a given nominal network.

상술한 도 2 내지 도 7에 설명한 내용은 특정한 한 개의 약물[k]에 대하여 적용될 수 있다. 복수 개의 약물들 각각에 대하여 도 2 내지 도 7에 설명한 기술을 독립적으로 적용할 수 있다. 즉, K개의 다른 약물에 대하여 정의되는 K개의 서로 다른 약물 반응성 네트워크(500{k})들의 가중치를 도 2 내지 도 7에 설명한 기술을 적용하여 확정할 수 있다. The information described in FIGS. 2 to 7 described above may be applied to a specific drug [k]. The techniques described in FIGS. 2 to 7 may be independently applied to each of a plurality of drugs. That is, the weights of K different drug reactivity networks 500{k} defined for K different drugs can be determined by applying the techniques described in FIGS. 2 to 7 .

즉, 한 개의 노미널 네트워크로부터 K개의 서로 다른 약물 반응성 네트워크(500{k})들의 구조는 용이하게 결정할 수 있다. 그러나 K개의 서로 다른 약물 반응성 네트워크(500{k})들 각각의 링크들에 할당되는 가중치의 값은 도 2 내지 도 7에 설명한 본 발명의 일 실시예에 따른 기술을 적용하여 확정할 수 있다. That is, the structures of K different drug responsiveness networks 500{k} can be easily determined from one nominal network. However, the value of the weight assigned to each link of the K different drug responsiveness networks 500{k} may be determined by applying the technique according to one embodiment of the present invention described in FIGS. 2 to 7 .

<특정 암환자의 치료에 적합한 약물의 선택 과정><The process of selecting drugs suitable for the treatment of specific cancer patients>

도 9는 본 발명의 일 실시예에 따라, 확정된 복수 개의 서로 다른 약물 반응성 네트워크들을 이용하여 환자[x]에 적합한 약물을 찾아내는 과정을 나타낸 것이다.9 illustrates a process of finding a drug suitable for patient [x] using a plurality of different drug reactivity networks determined according to an embodiment of the present invention.

이제 특정 암환자인 환자[x]에 대한 암치료가 필요한 상황을 가정할 수 있다. 그리고 환자[x]의 암세포주인 셀라인[x]의 변이정보를 획득할 수 있다고 가정한다. 그리고 환자[x]의 치료를 위해 총 K개의 약물에서 약물을 선택할 수 있다고 가정한다. 이러한 가정들은 현재 기술수준에서 충분히 실현 가능하다. Now, we can assume a situation in which cancer treatment is needed for patient [x], which is a specific cancer patient. It is also assumed that mutation information of cell line [x], which is a cancer cell line of patient [x], can be acquired. And assume that a drug can be selected from a total of K drugs for the treatment of patient [x]. These assumptions are fully feasible at the current level of technology.

그리고 상기 K개의 약물에 대한 완성된 약물 반응성 네트워크들이 상술한 도 2 내지 도 8의 기술에 의해 이미 준비되어 있다고 가정한다. And it is assumed that the completed drug reactivity networks for the K drugs are already prepared by the techniques of FIGS. 2 to 8 described above.

도 9에 제시한 것과 같이, K개의 약물 반응성 네트워크들(500[k]) 각각에 대하여 상기 셀라인[x]의 변이정보를 적용함으로써, 총 K개의 스페시픽 섭동 네트워크[x][k]를 생성할 수 있다(k=1, 2, 3, ..., K). As shown in FIG. 9, by applying the mutation information of the cell line [x] to each of the K drug response networks 500 [k], a total of K specific perturbation networks [x] [k] can be generated (k = 1, 2, 3, ..., K).

도 10은 도 9와 같이 준비한 K개의 스페시픽 섭동 네트워크[x][k]를 이용하여 환자[k]를 위한 최적의 약물을 결정하는 과정을 나타낸 것이다.FIG. 10 shows a process of determining the optimal drug for patient [k] using K specific perturbation networks [x] [k] prepared as shown in FIG. 9 .

도 10에 제시한 바와 같이 상기 K개의 스페시픽 섭동 네트워크[x][k]들은 각각 셀라인[x]에 약물[k]를 투여하였울 때에 상기 셀라인[x]의 사멸확률을 예측하는 시뮬레이션 예측값을 출력할 수 있다. As shown in FIG. 10, the K specific perturbation networks [x][k] predict the death probability of the cell line [x] when the drug [k] is administered to the cell line [x]. Simulation predictions can be output.

따라서 상기 K개의 시뮬레이션 예측값 들 중 가장 바람직한 값에 대응하는 약물 또는 약물들을 상기 환자[x]를 위한 치료제로서 제안할 수 있다. 상기 제안된 치료제는 의사 또는 신약 개발자에 의해 채택될 수 있다.Accordingly, a drug or drugs corresponding to the most desirable values among the K simulation predicted values may be proposed as a therapeutic agent for the patient [x]. The proposed treatment may be adopted by a physician or drug developer.

도 11은 본 발명의 일 실시예에 따라 약물 반응성 네트워크의 가중치를 결정하여 약물 반응성 네트워크를 완성하는 방법을 실행하는 컴퓨팅 장치의 구성을 나타낸 것이다. 11 illustrates a configuration of a computing device executing a method of completing a drug responsiveness network by determining weights of the drug responsiveness network according to an embodiment of the present invention.

컴퓨팅 장치(710)는 I/O 인터페이스부(711), 메모리(712), 및 CPU(713)를 포함할 수 있다. The computing device 710 may include an I/O interface unit 711 , a memory 712 , and a CPU 713 .

상기 메모리(712)에는 가중치가 결정되지 않은 약물 반응성 네트워크들의 정보인 제1정보가 저장되어 있을 수 있다. 상기 제1정보는 상기 약물 반응성 네트워크들의 상태천이규칙에 관한 정보(7121)를 포함할 수 있다.The memory 712 may store first information, which is information of drug responsiveness networks whose weights have not been determined. The first information may include information 7121 about state transition rules of the drug responsiveness networks.

그리고 상기 메모리(712)에는 상기 가중치가 결정되지 않은 약물 반응성 네트워크들 중, 네트워크 내의 가중치를 결정하고자 하는 하나의 약물 반응성 네트워크를 선택하도록 되어 있는, 약물 반응성 네트워크 선택 명령코드(간단히, 제1코드)(7122)가 저장되어 있을 수 있다. And, in the memory 712, a drug responsiveness network selection command code (simply, first code) configured to select one drug responsive network for which a weight in the network is to be determined among the drug responsive networks whose weights have not been determined. 7122 may be stored.

그리고 상기 메모리(712)에는 상기 선택된 약물 반응성 네트워크에 N개의 서로 다른 암세포주의 변이정보를 적용함으로써 N의 서로 다른 스페시픽 섭동 네트워크들을 생성하는, 스페시픽 섭동 네트워크 생성 명령코드(간단히, 제2코드)(7123)가 저장되어 있을 수 있다.And, in the memory 712, a specific perturbation network generation command code (simply, a second specific perturbation network) for generating N different specific perturbation networks by applying mutation information of N different cancer cell lines to the selected drug responsiveness network. Code) 7123 may be stored.

그리고 상기 메모리(712)에는, 도 3에 설명한 방법을 이용하여 상기 선택된 약물 반응성 네트워크의 가중치를 학습스텝마다 갱신하도록 되어 있는, 선택된 약물 반응성 네트워크의 가중치 갱신 명령코드(간단히, 제3코드)(7124)가 저장되어 있을 수 있다.And, in the memory 712, a command code (simply, third code) 7124 for updating the weights of the selected drug responsiveness network at each learning step by using the method described in FIG. 3 . ) may be stored.

그리고 상기 메모리(712)에는, 복수 회의 상기 학습스텝마다 에이전트(20)에 의해 출력된 복수 개의 가중치 집합들 중 최적의 가중치 집합을 결정하는, 선택된 약물 반응성 네트워크의 가중치 결정 명령코드(간단히, 제4코드)(7125)가 저장되어 있을 수 있다.And, in the memory 712, an instruction code for determining the weight of the selected drug responsiveness network (simply, fourth weight set) for determining an optimal weight set among a plurality of weight sets output by the agent 20 for each of a plurality of learning steps. code) 7125 may be stored.

그리고 상기 메모리(712)에는, 가중치가 결정된 상기 선택된 약물 반응성 네트워크에 관한 정보(7125)인 제2정보(7125)가 저장되어 있을 수 있다. 상기 제2정보는 상기 약물 반응성 네트워크들의 상태천이규칙에 관한 정보 및 결정된 가중치의 값들을 포함할 수 있다.In addition, the memory 712 may store second information 7125, which is information 7125 about the selected drug responsiveness network whose weight is determined. The second information may include information about state transition rules of the drug responsiveness networks and determined weight values.

상기 CPU(713)는, 상기 제1정보(7121)를 읽어서 이용할 수 있다. The CPU 713 can read and use the first information 7121.

그리고 상기 CPU(713)는, 상기 제1코드 내지 제4코드(7122~7125)를 읽어서 실행할 수 있다. Also, the CPU 713 may read and execute the first to fourth codes 7122 to 7125.

그리고 상기 CPU(713)는, 상기 제4코드(7125)에 의해 생성된 가중치 정보를 이용하여, 가중치가 결정된 복수 개의 약물 반응성 네트워크들에 대한 정보를 상기 메모리(712)에 저장할 수 있다. In addition, the CPU 713 may store, in the memory 712, information on the plurality of drug responsiveness networks, the weights of which are determined, using the weight information generated by the fourth code 7125.

그리고 상기 CPU(713)는, 상기 제1코드를 실행하여, 상기 가중치가 결정되지 않은 약물 반응성 네트워크들 중, 네트워크 내의 가중치를 결정하고자 하는 하나의 약물 반응성 네트워크를 선택하는, 약물 반응성 네트워크 선택 프로세스를 실행할 수 있다. 이로써, 예컨대 도 2a의 제k 약물 반응성 네트워크(500[k])가 준비될 수 있다. Then, the CPU 713 executes the first code to perform a drug responsiveness network selection process of selecting one drug responsiveness network for which a weight in the network is to be determined among the drug responsiveness networks whose weights have not been determined. can run Thus, for example, the kth drug responsive network 500[k] of FIG. 2A can be prepared.

상기 CPU(713)는, 상기 제2코드를 실행하여, 상기 선택된 약물 반응성 네트워크에 N개의 서로 다른 암세포주의 변이정보를 적용함으로써 N의 서로 다른 스페시픽 섭동 네트워크들을 생성하는, 스페시픽 섭동 네트워크 생성 프로세스를 실행할 수 있다. 이로써, 예컨대 도 2a의 제p 약물 반응성 네트워크(500[p][k])가 준비될 수 있다(p=1, 2, 3, ....., p_k(=N)). The CPU 713 executes the second code to generate N different specific perturbation networks by applying mutation information of N different cancer cell lines to the selected drug responsiveness network. A specific perturbation network You can run the creation process. Thus, for example, the p-th drug reactivity network 500 [p] [k] of FIG. 2A can be prepared (p=1, 2, 3, ....., p _k (=N)).

그리고 상기 CPU(713)는, 상기 제3코드를 실행하여, 도 3에 설명한 방법을 이용하여 상기 선택된 약물 반응성 네트워크의 가중치를 학습스텝마다 갱신하는, 선택된 약물 반응성 네트워크의 가중치 갱신 프로세스를 실행할 수 있다. 이 프로세스는 예컨대 도 3에 설명한 방법으로 실현될 수 있다. In addition, the CPU 713 executes the third code to execute a weight update process of the selected drug responsiveness network, which updates the weight of the selected drug responsiveness network at each learning step using the method described in FIG. 3 . . This process can be realized, for example, in the manner described in FIG. 3 .

그리고 상기 CPU(713)는, 상기 제4코드를 실행하여, 복수 회의 상기 학습스텝마다 에이전트(20)에 의해 출력된 복수 개의 가중치 집합들 중 최적의 가중치 집합을 결정하는, 선택된 약물 반응성 네트워크의 가중치 결정 프로세스를 실행할 수 있다. 이 프로세서는 예컨대 도 7에 제시한 에피소드[k]의 실행과정에서 수행된 복수 개의 학습스텝들의 결과물을 이용하여 실현될 수 있다. In addition, the CPU 713 executes the fourth code to determine an optimal weight set among a plurality of weight sets output by the agent 20 for each of a plurality of learning steps, the weight of the selected drug responsiveness network. The decision process can be executed. This processor can be realized, for example, by using the results of a plurality of learning steps performed in the execution process of episode [k] shown in FIG. 7 .

상기 CPU(713)는, 상기 I/O 인터페이스부(711)를 이용하여 상기 가중치가 결정된 상기 선택된 약물 반응성 네트워크에 관한 정보를 다른 컴퓨팅 장치에 제공하거나, 또는 상기 컴퓨팅 장치(710)의 후속 프로세스의 실행을 위한 정보로서 제공할 수 있다. The CPU 713 uses the I/O interface 711 to provide information about the selected drug-responsive network whose weight is determined to another computing device, or to perform a subsequent process of the computing device 710. It can be provided as information for execution.

도 12는 본 발명의 일 실시예에 따라 특정 암세포주의 사멸에 효과적인 최적 약물을 결정하는 시뮬레이션 방법을 실행하는 컴퓨팅 장치의 구성을 나타낸 것이다.12 illustrates a configuration of a computing device executing a simulation method for determining an optimal drug effective for killing a specific cancer cell line according to an embodiment of the present invention.

컴퓨팅 장치(810)는 I/O 인터페이스부(811), 메모리(812), 및 CPU(813)를 포함할 수 있다. The computing device 810 may include an I/O interface unit 811 , a memory 812 , and a CPU 813 .

컴퓨팅 장치(810)는 상기 I/O 인터페이스부(811)를 통해 K개의 약물 후보군에 관한 정보 및 제1암세포주의 변이정보를 입력받을 수 있다. 상기 K개의 약물 후보군에 관한 정보는 K개의 약물을 특정할 수 있는 정보일 수 있다. 상기 제1암세포주는 특정 환자인 제1환자의 신체에서 얻은 것일 수 있다. The computing device 810 may receive information about K drug candidates and mutation information of the first cancer cell line through the I/O interface 811 . The information on the K drug candidates may be information capable of specifying K drugs. The first cancer cell line may be obtained from the body of a first patient, which is a specific patient.

상기 메모리(812)에는 가중치가 결정된 약물 반응성 네트워크들의 정보인 제3정보(8121)가 저장되어 있을 수 있다. 상기 가중치가 결정된 약물 반응성 네트워크들은 상기 K개의 약물로부터 생성된 K개의 약물 반응성 네트워크들이 포함되어 있을 수 있다. 상기 제3정보(8121)는 도 11의 메모리(712)에 저장되어 있던 상기 제2정보(7126)와 동일한 것일 수 있다. 상기 제3정보는 상기 K개 보다 더 많은 개수의 약물들에 관한 약물 반응성 네트워크들을 포함할 수 있다. The memory 812 may store third information 8121, which is information of drug responsiveness networks whose weights have been determined. The drug responsiveness networks for which the weight is determined may include K drug responsiveness networks generated from the K drugs. The third information 8121 may be the same as the second information 7126 stored in the memory 712 of FIG. 11 . The third information may include drug reactivity networks related to a number of drugs greater than the K number.

그리고 상기 메모리(812)에는 가중치가 결정되어 있는 상기 K개의 약물 반응성 네트워크들 각각에 상기 제1암세포주의 변이정보를 적용함으로써 K개의 스페시픽 섭동 네트워크들을 생성하는 명령코드(간단히, 제5코드)(8122)가 저장되어 있을 수 있다. And, in the memory 812, a command code (simply, fifth code) for generating K specific perturbation networks by applying the mutation information of the first cancer cell line to each of the K drug response networks of which weights are determined. 8122 may be stored.

그리고 상기 메모리(812)에는 상기 K개의 스페시픽 섭동 네트워크들로부터 각각 얻을 수 있는 사멸확률을 산출하는 명령코드(간단히, 제6코드)(8123)가 저장되어 있을 수 있다. 이때, 상기 K개의 스페시픽 섭동 네트워크들 중 제k 스페시픽 섭동 네트워크로부터 얻을 수 있는 상기 사멸확률은, 상기 제1암세포주에 약물[k]를 투여하였을 경우에 상기 제1암세포주가 사멸할 확률을 나타내는 시뮬레이션 값일 수 있다. In addition, the memory 812 may store a command code (simply, a sixth code) 8123 for calculating death probabilities obtained from each of the K specific perturbation networks. At this time, the death probability obtained from the k th specific perturbation network among the K specific perturbation networks is determined by the fact that the first cancer cell line will die when the drug [k] is administered to the first cancer cell line. It may be a simulated value representing a probability.

그리고 상기 메모리(812)에는 상기 K개의 스페시픽 섭동 네트워크들로부터 얻은 K개의 사멸확률 중에서 선택된 M개의 사멸확률에 대응하는 M개의 약물을 결정하여 최적 약물 후보군에 포함시키고, 상기 최적 약물 후보군을 출력하는 명령코드(간단히, 제7코드)(8124)가 저장되어 있을 수 있다(M<=K). 상기 최적 약물 후보군의 출력은 상기 I/O 인터페이스부(811)를 통해 실행될 수 있다. And in the memory 812, M drugs corresponding to the M death probabilities selected from the K death probabilities obtained from the K specific perturbation networks are determined and included in the optimal drug candidate group, and the optimal drug candidate group is output A command code (simply, seventh code) 8124 may be stored (M<=K). The output of the optimal drug candidate group may be executed through the I/O interface unit 811 .

바람직한 일 실시예에서, 상기 K개의 사멸확률 중 가장 높은 사멸확률에 대응하는 약물을 상기 최적 약물 후보군에 포함시킬 수 있다. In a preferred embodiment, a drug corresponding to the highest death probability among the K death probabilities may be included in the optimal drug candidate group.

상기 CPU(813)는, 링크의 가중치가 결정된 상기 선택된 약물 반응성 네트워크에 관한 정보(7125)인 상기 제3정보(8121)를 읽어서 이용할 수 있다. 상기 제3정보는 예컨대 도 9에 제시한 약물 반응성 네트워크(500[k])에 관한 정보들을 포함할 수 있다(k=1, 2, 3, ..., K).The CPU 813 may read and use the third information 8121, which is information 7125 about the selected drug responsive network for which the link weight is determined. The third information may include, for example, information about the drug responsiveness network 500 [k] shown in FIG. 9 (k=1, 2, 3, ..., K).

그리고 상기 CPU(813)는, 상기 제5코드 내지 제7코드(8122~8124)를 읽어서 실행할 수 있다. Further, the CPU 813 may read and execute the fifth to seventh codes 8122 to 8124.

그리고 상기 CPU(813)는 상기 제5코드를 실행하여, 가중치가 결정되어 있는 상기 K개의 약물 반응성 네트워크들 각각에 상기 제1암세포주의 변이정보를 적용함으로써 K개의 스페시픽 섭동 네트워크들을 생성하는, 스페시픽 섭동 네트워크 생성 프로세스를 실행할 수 있다. 이로써, 예컨대 도 9에 제시한 스페시픽 섭동 네트워크(500[x][k])에 관한 정보들을 포함할 수 있다(k=1, 2, 3, ..., K, 그리고 x는 상기 제1암세포주를 나타내는 인덱스).And the CPU 813 executes the fifth code to generate K specific perturbation networks by applying the mutation information of the first cancer cell line to each of the K drug response networks of which weights are determined. You can run a specific perturbation network generation process. Thus, for example, information on the specific perturbation network 500[x][k] shown in FIG. 9 may be included (k=1, 2, 3, ..., K, and x are the first 1 Index representing cancer cell lines).

그리고 상기 CPU(813)는 상기 제6코드를 실행하여, 상기 K개의 스페시픽 섭동 네트워크들로부터 각각 얻을 수 있는 사멸확률을 산출하는, 스페시픽 섭동 네트워크 별 사멸확률 산출 프로세스를 실행할 수 있다. 이로써, 예컨대 도 10에 제시한 약물[k]를 제1암세포주에 투여하였을 때에 상기 제1암세포주의 사멸확률의 시뮬레이션된 값을 획득할 수 있다(제1암세포주는 셀라인[x]에 대응됨).Further, the CPU 813 may execute a process of calculating death probabilities for each specific perturbation network, which calculates death probabilities obtained from each of the K specific perturbation networks by executing the sixth code. Thus, for example, when the drug [k] shown in FIG. 10 is administered to the first cancer cell line, it is possible to obtain a simulated value of the death probability of the first cancer cell line (the first cancer cell line corresponds to cell line [x]). ).

그리고 상기 CPU(813)는 상기 제6코드를 실행하여, 상기 K개의 스페시픽 섭동 네트워크들로부터 얻은 K개의 사멸확률 중에서 선택된 M개의 사멸확률에 대응하는 M개의 약물을 결정하여 최적 약물 후보군에 포함시키고, 상기 최적 약물 후보군을 출력하는, 최적 약물 후보군 결정 및 출력 프로세스를 실행할 수 있다. Then, the CPU 813 executes the sixth code to determine M drugs corresponding to the selected M death probabilities among the K death probabilities obtained from the K specific perturbation networks, and include them in the optimal drug candidate group. and an optimal drug candidate group determination and output process of outputting the optimal drug candidate group may be executed.

상기 CPU(813)는, 상기 I/O 인터페이스부(811)를 이용하여 상기 결정된 최적 약물 후보군에 관한 정보를 다른 컴퓨팅 장치에 제공하거나, 또는 상기 컴퓨팅 장치(810)의 후속 프로세스의 실행을 위한 정보로서 제공할 수 있다. The CPU 813 uses the I/O interface 811 to provide information on the determined optimal drug candidate group to another computing device, or information for execution of a subsequent process by the computing device 810. can be provided as

도 11의 컴퓨팅 장치(710)와 도 12의 컴퓨팅 장치(810)는 각각 독립적으로 제공될 수도 있고, 한 개의 통합된 장치로서 제공될 수도 있다.The computing device 710 of FIG. 11 and the computing device 810 of FIG. 12 may be provided independently or may be provided as one integrated device.

도 13은 본 발명의 일 실시예에 따라 제공되는 암치료 후보 약물 결정 시스템의 구조를 나타낸 것이다.13 shows the structure of a cancer treatment candidate drug determination system provided according to an embodiment of the present invention.

암치료 후보 약물 결정 시스템은 약물 반응 스크리닝 장치(600) 및 시뮬레이션 장치(80)(컴퓨팅 장치(810))를 포함할 수 있다. 상기 암치료 후보 약물 결정 시스템은 도 12의 컴퓨팅 장치(710)를 더 포함할 수 있다.The cancer treatment candidate drug determination system may include a drug response screening device 600 and a simulation device 80 (computing device 810). The cancer treatment candidate drug determination system may further include the computing device 710 of FIG. 12 .

상기 약물 반응 스크리닝 장치(600)는 컴퓨팅 장치(610), 약물 뱅크(620), 약물조합장치(630), 마이크로 피펫(640), 웰-메트릭스 접시(650), 및 세포이미지 촬영장치(660)을 포함할 수 있다.The drug reaction screening device 600 includes a computing device 610, a drug bank 620, a drug combination device 630, a micro pipette 640, a well-matrix dish 650, and a cell image capturing device 660. can include

상기 컴퓨팅 장치(610)는 I/O 인터페이스부(611), 메모리(612), 및 CPU(613)를 포함할 수 있다. 상기 메모리(612)에는 약물 조합 명령코드(6121), 실시간 세포영상분석 명령코드(6122), 최적약물 제시 명령코드(6123)가 저장되어 있을 수 있다. 상기 CPU(613)은 약물 조합 명령코드(6121), 실시간 세포영상분석 명령코드(6122), 최적약물 제시 명령코드(6123)을 읽어 들여, 각각 이에 대응하는 약물 조합 프로세스(6131), 실시간 세포영상분석 프로세스(6132), 및 최적약물 제시 프로세스(6133)를 실행할 수 있다.The computing device 610 may include an I/O interface unit 611 , a memory 612 , and a CPU 613 . The memory 612 may store a drug combination command code 6121, a real-time cell image analysis command code 6122, and an optimal drug presentation command code 6123. The CPU 613 reads the drug combination command code 6121, the real-time cell image analysis command code 6122, and the optimal drug presentation command code 6123, and performs a drug combination process 6131 corresponding to them, respectively, and a real-time cell image An analysis process 6132 and an optimal drug presentation process 6133 can be executed.

도 13의 시뮬레이션 장치(80)는 도 12의 컴퓨팅 장치(810)이거나 또는 도 11의 컴퓨팅 장치(710)와 도 12의 컴퓨팅 장치(810)를 통합한 통합 장치일 수 있다. The simulation device 80 of FIG. 13 may be the computing device 810 of FIG. 12 or an integrated device combining the computing device 710 of FIG. 11 and the computing device 810 of FIG. 12 .

도 11에 설명한 것과 같이, 상기 시뮬레이션 장치(80)는, 제1암세포주 변이정보 및 K개의 약물후보군 정보를 입력받고, M개의 선택된 약물에 관한 정보를 상기 약물 반응 스크리닝 장치(600)에게 제공할 수 있다.As described in FIG. 11 , the simulation device 80 receives first cancer cell line mutation information and K drug candidate group information, and provides information on M selected drugs to the drug response screening device 600. can

상기 I/O 인터페이스부(611)는 상기 M개의 선택된 약물에 관한 정보를 상기 CPU(613)에게 전달할 수 있다. 상기 CPU(613)에서 실행되는 상기 약물 조합 명령 프로세스(6131)는 상기 M개의 선택된 약물에 관한 정보를 이용하여, 약물 뱅크(620)로부터 상기 M개의 선택된 약물을 추출하여 웰-메트릭스 접시(650)에 주입하도록 하는 명령을 약물조합장치(630) 및 마이크로 피펫(640)에 전달할 수 있다. 상기 명령은 상기 I/O 인터페이스부(611)을 통해 전달될 수 있다. The I/O interface unit 611 may deliver information about the M selected drugs to the CPU 613 . The drug combination instruction process 6131 executed in the CPU 613 extracts the M selected drugs from the drug bank 620 using the information on the M selected drugs, and prepares a well-matrix dish 650. A command to inject may be transmitted to the drug combination device 630 and the micropipette 640. The command may be delivered through the I/O interface unit 611.

상기 약물 뱅크(620)는 적어도 상기 M개의 선택된 약물을 포함하는 복수 개의 약물들이 준비되어 있는 약물 저장소일 수 있다. The drug bank 620 may be a drug storage in which a plurality of drugs including at least the M number of selected drugs are prepared.

또는, 상기 약물 뱅크(620)는 적어도 상기 K개의 약물후보군을 포함하는 복수 개의 약물들이 준비되어 있는 약물 저장소일 수 있다.Alternatively, the drug bank 620 may be a drug storage in which a plurality of drugs including at least the K drug candidate groups are prepared.

상기 약물조합장치(630)는 상기 약물 뱅크(620)가 저장하고 있는 복수 개의 약물들을 추출하여 마이크로 피팻(640)에게 제공할 수 있도록 되어 있는 기계장치일 수 있다. The drug combination device 630 may be a mechanical device capable of extracting a plurality of drugs stored in the drug bank 620 and supplying the extracted drugs to the micro pipette 640 .

상기 M개의 선택된 약물 중 어느 하나의 약물이 단일 약물인 제1약물인 경우, 상기 약물조합장치(630)는 상기 제1약물을 상기 약물 뱅크(620)로부터 추출하여 상기 마이크로 피펫(640)에게 제공할 수 있다. When any one of the M selected drugs is a single drug, the first drug, the drug combination device 630 extracts the first drug from the drug bank 620 and provides it to the micropipette 640 can do.

만일, M개의 선택된 약물 중 어느 하나의 약물이 제1약물과 제2약물의 조합약물인 경우, 상기 약물조합장치(630)는 상기 제1약물과 제2약물을 상기 약물 뱅크(620)로부터 추출하여 서로 조합한 조합약물을 상기 마이크로 피펫(640)에게 제공할 수 있다. If any one of the M selected drugs is a combination drug of a first drug and a second drug, the drug combination device 630 extracts the first drug and the second drug from the drug bank 620 Thus, combination drugs may be provided to the micropipette 640 .

상기 웰-메트릭스 접시(650)는 복수 개의 웰(well)들이 형성되어 있는 접시일 수 있다.The well-matrix dish 650 may be a dish in which a plurality of wells are formed.

상기 약물 반응 스크리닝 장치(600)는 상기 제1암세포주의 배양액을 상기 웰-메트릭스 접시(650)의 M개의 웰들에 주입하여 저장하도록 되어 있을 수 있다. The drug response screening device 600 may be configured to inject and store the culture medium of the first cancer cell line into M wells of the well-matrix dish 650 .

상기 마이크로 피펫(640)은 상기 약물조합장치(630)로부터 제공받은 약물 또는 약물조합을 상기 웰-메트릭스 접시(650)에 형성된 복수 개의 웰들 중 하나에 주입할 수 있다. The micro pipette 640 may inject the drug or drug combination provided from the drug combination device 630 into one of a plurality of wells formed in the well-matrix dish 650 .

상기 M개의 선택된 약물은, 상기 제1암세포주의 배양액들이 저장된 M개의 웰에 각각 주입될 수 있다.The M selected drugs may be respectively injected into M wells in which culture solutions of the first cancer cell line are stored.

M개의 상기 웰들에 저장되어 있는 상기 제1암세포주는 투여된 약물에 따라 그 생존률 및 사멸률이 결정될 수 있다.The survival rate and death rate of the first cancer cell line stored in the M wells may be determined according to the administered drug.

상기 실시간 세포영상분석 명령 프로세스(6132)는 상기 I/O 인터페이스부(611)를 통해 세포이미지 촬영장치(660)에게 M개의 상기 웰들에서의 제1암세포주의 이미지를 촬영하고 그 결과 이미지를 실시간 세포영상분석 명령 프로세스(6132)에게 회신하도록 명령할 수 있다. The real-time cell image analysis command process 6132 captures an image of the first cancer cell line in the M wells to the cell image capturing device 660 through the I/O interface 611 and outputs the resulting image in real time. The video analysis command process 6132 may be instructed to reply.

상기 최적약물 제시 명령 프로세스(6133)는 상기 세포이미지 촬영장치(660)가 전송한 이미지들을 기초로 M개의 상기 웰 각각에서의 세포의 성장 정도에 관한 값, 세포의 사멸의 정도에 관한 값, 즉, 세포의 사멸속도, 세포의 성장속도, 웰 내에서의 세포영역의 면적 등과 같은 정보를 생성할 수 있다. 그리고 상기 생성한 정보를 기초로 상기 M개의 선택된 약물 중에서, in vitro 실험결과 암세포 사멸에 효과적이라고 판단되는 약물들에 관한 정보를 디스플레이 화면, 스피커, 프린터 등을 이용하여 출력할 수 있다. 또는 상기 약물 반응 스크리닝 장치(600)는 상기 M개의 선택된 약물 각각의 in vitro 실험결과를 출력할 수 있다.The optimal drug presentation command process 6133 is a value related to the degree of cell growth and the degree of cell death in each of the M wells based on the images transmitted by the cell image photographing device 660, that is, , cell death rate, cell growth rate, and information such as the area of the cell region in the well can be generated. In addition, based on the generated information, information on drugs that are determined to be effective in killing cancer cells among the M selected drugs as a result of in vitro experiments can be output using a display screen, a speaker, a printer, or the like. Alternatively, the drug response screening device 600 may output in vitro test results of each of the M selected drugs.

일 실시예에서, 상기 K개의 약물 후보군은 모두 실제 환자에게 투여하는 것이 허가된 약물일 수 있다. 예컨대 상기 K개의 약물 후보군 또는 약물 뱅크에 포함된 약물들은 모두 환자에게 직접 처방이 가능한 FDA 승인 약물일 수 있다. 이 경우, 상기 약물 반응 스크리닝 장치(600)가 출력하는 정보는 상기 제1암세포주의 암환자를 치료하기 위한 약물의 최종 후보로서 간주될 수 있다. 이때, 상기 인터페이스부(611)가 출력한 M개의 선택된 약물의 in vitro 실험결과는 환자를 진료하고 치료하는 의사에게 유용한 정보로 취급될 수 있다. In one embodiment, all of the K drug candidates may be drugs approved for administration to actual patients. For example, the K drug candidates or drugs included in the drug bank may all be FDA-approved drugs that can be directly prescribed to patients. In this case, information output from the drug response screening device 600 may be regarded as a final drug candidate for treating cancer patients of the first cancer cell line. At this time, the in vitro test results of the M selected drugs output by the interface unit 611 can be treated as useful information to a doctor who diagnoses and treats patients.

다른 실시예에서, 상기 K개의 약물 후보군은 모두 약으로 개발할 후보 물질들 중에서 암에 약효가 있는 단일 약물 또는 조합 약물일 수 있다. 즉, 상기 K개의 약물 후보군 또는 약물 뱅크에 포함된 약물들은 개발 중인 신약 후보물질로 구성된 것들일 수 있다. 나아가 상기 K개의 약물 후보군 또는 약물 뱅크에 포함된 약물들은 아직 FDA 승인되지 않은 것들일 수 있다. 이 경우, 상기 약물 반응 스크리닝 장치(600)가 출력하는 정보는 상기 제1암세포주의 암환자를 치료하기 위한 정보로서 사용되지는 않을 수 있다. 그러나 상기 인터페이스부(611)가 출력한 M개의 선택된 약물의 in vitro 실험결과는 신약을 개발하는 신약 개발자에게 유용한 정보로 취급될 수 있다. In another embodiment, all of the K drug candidates may be a single drug or a combination drug that is effective for cancer among candidate substances to be developed into drugs. That is, the drugs included in the K drug candidate groups or drug bank may be composed of new drug candidates under development. Furthermore, the drugs included in the K drug candidate group or drug bank may not yet be approved by the FDA. In this case, information output from the drug response screening device 600 may not be used as information for treating cancer patients of the first cancer cell line. However, the in vitro test results of M selected drugs output by the interface unit 611 can be treated as useful information for new drug developers.

이와 같이 본 발명의 일 실시예에 따제 제공되는 약물 반응 스크리닝 장치에 포함된 약물 뱅크를 구성하는 조성물을 변화시킴으로써, 환자 치료 분야에 직접 활용되는 기술이 제공되거나, 또는 신약 개발 분야에서 직접 활용될 수 있는 기술이 제공될 수 있다. In this way, by changing the composition constituting the drug bank included in the drug response screening device provided according to one embodiment of the present invention, a technology that is directly used in the field of patient treatment is provided, or can be directly used in the field of new drug development technology can be provided.

도 11의 컴퓨팅 장치(710), 도 12의 컴퓨팅 장치(810), 및 도 13의 컴퓨팅 장치(610) 중 적어도 하나 이상은 한 개의 통합된 장치로 제공될 수도 있다.At least one of the computing device 710 of FIG. 11 , the computing device 810 of FIG. 12 , and the computing device 610 of FIG. 13 may be provided as one integrated device.

이하, 후술하는 도 14a 및 도 14b를 통칭하여 도 14라고 지칭할 수 있다.Hereinafter, FIGS. 14A and 14B to be described below may be collectively referred to as FIG. 14 .

도 14는, 도 3에 제시한 완성된 에이전트를 학습하는 방법을 설명하는 프레임워크이다. FIG. 14 is a framework for explaining a method for learning the completed agent presented in FIG. 3 .

에이전트(20)는, 노드들 및 링크들로 구성되는 상기 스페시픽 섭동 네트워크의 링크들에 할당되는 가중치를 결정하는 역할을 할 수 있다. 상기 스페시픽 섭동 네트워크의 링크들에 할당된 가중치가 적절한 값으로 결정되어야 상기 스페시픽 섭동 네트워크가 세포주의 사멸확률을 더 정확하게 출력할 수 있다.The agent 20 may be responsible for determining weights assigned to links of the specific perturbation network composed of nodes and links. When the weights assigned to the links of the specific perturbation network are determined to be appropriate values, the specific perturbation network can more accurately output the death probability of the cell line.

상기 에이전트(20)의 구조는 미리 설계될 수 있지만, 상기 에이전트(20)의 입출력 특성 또는 상기 에이전트(20) 동작을 위해 상기 에이전트(20)의 내부에 부여되는 파라미터들의 값은 소정의 초기값으로부터 최적의 값으로 갱신되어야 한다. 이러한 갱신을 위하여 상기 에이전트(20)는 학습되어야 한다.Although the structure of the agent 20 may be designed in advance, the input/output characteristics of the agent 20 or the values of parameters given to the inside of the agent 20 for the operation of the agent 20 may vary from a predetermined initial value. It should be updated to an optimal value. For this update, the agent 20 must be trained.

상기 에이전트(20)의 학습을 위하여 복수 개의 학습용 암세포주들 그리고 복수 개의 약물(약물조합)들의 일부 또는 전부가 학습 데이터로서 이용될 수 있다. For learning of the agent 20, some or all of a plurality of cancer cell lines for learning and a plurality of drugs (drug combinations) may be used as learning data.

상기 1회의 학습스텝을 위해 한 세트의 학습용 암세포주 및 1개의 약물에 관한 정보가 이용될 수 있다. in vitro 실험장치(90)를 이용하여, 상기 한 세트의 학습용 암세포주에 각각 상기 1개의 약물을 in vitro 실험으로 투여하였을 때에, 상기 한 세트의 학습용 암세포주들의 사멸비율을 관찰하여 결정할 수 있다. 상기 사멸비율들은 예컨대 N개의 스칼라 값들로 구성된 벡터 Z로 제시될 수 있다. Information on a set of cancer cell lines for learning and one drug may be used for the one learning step. Using the in vitro experiment device 90, when the one drug is administered to each of the one set of cancer cell lines for learning in an in vitro experiment, the death rate of the one set of cancer cell lines for learning can be observed and determined. The mortality rates may be presented as a vector Z consisting of N scalar values, for example.

그리고 상기 한 세트의 학습용 암세포주를 각각 모델링한 한 세트의 스페시픽 네트워크들을 섭동하여 한 세트의 스페시픽 섭동 네트워크를 생성할 수 있다. 이때, 각각의 스페시픽 네트워크에서 섭동되는 노드는, 선택된 약물이 작용하는 단백질에 대응하는 노드이다. In addition, a set of specific perturbation networks may be generated by perturbing a set of specific networks each modeling the set of cancer cell lines for learning. At this time, the node perturbed in each specific network is a node corresponding to a protein on which the selected drug acts.

그리고 상기 한 세트의 스페시픽 섭동 네트워크에서 얻을 수 있는 한 세트의 사멸확률을 산출할 수 있다. 상기 사멸비율들은 예컨대 N개의 스칼라 값들로 구성된 벡터 Y로 제시될 수 있다. In addition, a set of death probabilities that can be obtained from the set of specific perturbation networks can be calculated. The death rates may be presented as a vector Y consisting of N scalar values, for example.

상기 에이전트(20)는 상기 리워드 및/또는 상기 약물 반응성 네트워크에 할당되었던 링크의 가중치들을 입력 데이터로서 입력받을 수 있다. The agent 20 may receive the reward and/or the weights of the links allocated to the drug responsiveness network as input data.

상기 에이전트(20)는, 상기 에이전트(20)에 입력된 데이터를 기초로 다음 회차의 학습스텝에서 상기 약물 반응성 네트워크의 링크들에 할당할 가중치들의 값을 갱신한 정보를 출력할 수 있다. The agent 20 may output information obtained by updating values of weights to be assigned to links of the drug responsiveness network in a next learning step based on data input to the agent 20 .

연속적으로 실행된 복수 개의 학습스텝들의 집합을 학습 에피소드라고 지칭할 수 있다. A set of a plurality of continuously executed learning steps may be referred to as a learning episode.

일 실시예에서, 1회의 에피소드 내의 모든 학습스텝에 대해서는 학습 데이터로서 이용되는 약물은 1개로 제한될 수 있다. 에피소드가 변경된 이후에야 학습용 약물을 변경할 수 있다. 그러나 한 개의 에피소드 내의 제1학습스텝에서 이용되는 제1세트의 학습용 암세포주들과, 상기 한 개의 에피소드 내의 제2학습스텝에서 이용되는 제2세트의 학습용 암세포주들의 구성은 서로 다를 수 있다. In one embodiment, for all learning steps within one episode, the number of drugs used as learning data may be limited to one. Only after the episode has been changed can the study drug be changed. However, the configurations of the first set of cancer cell lines for learning used in the first learning step in one episode and the second set of cancer cell lines for learning used in the second learning step in the one episode may be different.

1회의 상기 학습스텝마다 상기 노미널 네트워크의 가중치의 값들이 1회 갱신될 수 있다. 복수 회의 학습스텝들로 구성된 에피소드마다 상기 에이전트(20)가 학습될 수 있다. 상기 에피소드가 반복될 때마다 상기 에이전트(20)의 학습량이 증가한다. Values of the weights of the nominal network may be updated once for each learning step. The agent 20 may be learned for each episode consisting of a plurality of learning steps. Each time the episode is repeated, the learning amount of the agent 20 increases.

본 발명의 일 실시예에서, 상기 에이전트(20)를 학습시키기 위하여 총 K회의 에피소드를 실행할 수 있다. 본 명세서에서 '에피소드'란 에이전트(20)를 1회 학습시키는 단위를 말한다. 즉, 에피소드가 총 K회가 실행되면, 에이전트(20)는 K회 학습된다. 본 발명의 일 실시예에서 1개의 에피소드는 오직 1개의 약물에만 연관된다. In one embodiment of the present invention, a total of K episodes may be executed to train the agent 20 . In this specification, an 'episode' refers to a unit in which the agent 20 is trained once. That is, if an episode is executed a total of K times, the agent 20 is learned K times. In one embodiment of the present invention, one episode is associated with only one drug.

도 7에 언급한 학습스텝을 복수 회 실행함으로써 1회의 에피소드를 실행할 수 있다. 이하 이에 대하여 자세히 설명한다. One episode can be executed by executing the learning step mentioned in FIG. 7 a plurality of times. Hereinafter, this will be described in detail.

도 14a는 미완성된 에이전트(20)를 학습시키기 위한 K회의 에피소드들 중 k번째 에피소드의 프레임워크를 나타낸 것이다.FIG. 14A shows the framework of the kth episode among K episodes for training the unfinished agent 20 .

도 14a에 제시된 구조는 도 3에 제시한 구조와 동일하다. 다만, 도 3의 에이전트(20)는 총 K회의 학습이 완료된 것임에 비하여, 도 14a에 제시된 에이전트는 아직 총 K회의 학습이 완료되지 않은 것이라는 점이 다르다. The structure shown in FIG. 14A is the same as the structure shown in FIG. 3 . However, while the agent 20 of FIG. 3 has completed learning a total of K times, the agent presented in FIG. 14a is different in that a total of K times of learning has not yet been completed.

도 14b는 본 발명의 일 실시예에 따라 복수 회의 에피소드를 실행함으로써 에이전트의 학습을 완료하는 과정을 나타낸 것이다. 14B illustrates a process of completing agent learning by executing a plurality of episodes according to an embodiment of the present invention.

1회의 에피소드의 실행이 종료되면 에이전트(20)가 1회 학습될 수 있다. When the execution of one episode ends, the agent 20 may be trained once.

제k 에피소드는 다음과 같이 단계들을 포함할 수 있다(k=1, 2, 3, ..., K).The kth episode may include steps as follows (k=1, 2, 3, ..., K).

첫째, 약물[k]에 대한 반응성에 관한 정보가 in-vitro 실험을 통해 준비되어 있는 p_k개의 셀라인들을 선택하고, 상기 p_k개의 셀라인들을 돌연변이 정보를 준비할 수 있다.First, p _k number of cell lines for which information on reactivity to drug [k] is prepared through an in-vitro experiment may be selected, and mutation information may be prepared for the p _k number of cell lines.

둘째, 약물[k]에 대한 약물 반응성 네트워크에 상기 준비된 p_k개의 셀라인들의 돌연변이 정보를 적용함으로써 p_k개의 스페시픽 섭동 네트워크들을 생성할 수 있다.Second, p _k specific perturbation networks may be generated by applying mutation information of the prepared p _k cell lines to the drug reactivity network for drug [k].

셋째, 상기 생성된 p_k개의 스페시픽 섭동 네트워크들을 이용하여 도 14a의 프레임워크를 구성할 수 있다.Third, the framework of FIG. 14a can be constructed using the generated p _k specific perturbation networks.

넷째, 상기 구성된 도 14a의 도 14a의 프레임워크를 이용하여 U_k회의 상기 학습스텝을 실행할 수 있다. Fourth, the learning step may be executed U _k times using the framework of FIG. 14A of FIG. 14A configured above.

제k 에피소드가 완료되면, U_k회의 상기 학습스텝을 실행하는 과정에서 상기 에이전트(20)가 출력한 U_k세트의 링크 가중치들, 그리고 상기 에이전트(20)에 입력된 U_k 개의 리워드들을 이용하여 상기 에이전트(20)를 1회 학습시킬 수 있다. When the k th episode is completed, using the U _k set of link weights output by the agent 20 in the process of executing the learning step U _k times and the U _k number of rewards input to the agent 20, The agent 20 may be trained once.

일 실시예에서, k1와 k2가 다르다면, p_k1과 p_k2개는 서로 다를 수 있고, 그리고 U_k1과 U_k2개는 서로 다를 수 있다. In one embodiment, if k1 and k2 are different, p _k1 and p _k2 may be different from each other, and U _k1 and U _k2 may be different from each other.

에이전트(20)는 서로 다른 K개의 약물 반응성 네트워크의 가중치를 결정하는 과정을 통해 학습되기 때문에, 특정 약물에 대한 약물 반응성 네트워크의 가중치를 결정하는 데에만 이용되지 않을 수 있다. Since the agent 20 is learned through a process of determining weights of K different drug responsiveness networks, it may not be used only to determine weights of drug responsiveness networks for a specific drug.

<암세포주들에 특정 약물을 투여하여 사멸비율 Z를 획득하는 방법><Method of obtaining death rate Z by administering a specific drug to cancer cell lines>

도 15는 본 발명의 일 실시예에 따라, 암세포주들에 특정 약물을 투여하여 사멸비율 Z를 획득하여 제공하는 시스템의 구성을 나타낸 것이다. 15 shows the configuration of a system for obtaining and providing a death rate Z by administering a specific drug to cancer cell lines according to an embodiment of the present invention.

생체신호전달 네트워크 생성 시스템(100)은, 컴퓨팅 장치(50), 세포주 실험장치(60), 및 데이터 서버(70)를 포함할 수 있다. The biological signal transmission network generation system 100 may include a computing device 50 , a cell line experiment device 60 , and a data server 70 .

세포주 실험장치(60)는 세포주 용기(61), 약물투여 장치(62), 및 세포주 상태 관측장치(63)를 포함할 수 있다.The cell line experiment device 60 may include a cell line container 61 , a drug administration device 62 , and a cell line state observation device 63 .

세포주 용기(61)에 구비된 복수 개의 웰(well) 에는 예컨대 암세포주들이 구분되어 제공될 수 있다.For example, cancer cell lines may be provided separately in the plurality of wells provided in the cell line container 61 .

약물투여 장치(62)는 선택된 특정 약물을 세포주 용기(61)에 제공된 상기 암세포주들에 투여할 수 있다.The drug administration device 62 may administer a selected specific drug to the cancer cell lines provided in the cell line container 61 .

세포주 상태 관측장치(63)는 상기 특정 약물이 투여된 이후 상기 암세포주들의 사멸비율을 관측하여 출력할 수 있다.The cell line state monitoring device 63 may observe and output the death rate of the cancer cell lines after the specific drug is administered.

세포주 실험장치(60)는 상기 관측된 사멸비율들을 상기 컴퓨팅 장치(50)에 제공하도록 되어 있을 수 있다.The cell line experiment device 60 may be configured to provide the observed death rates to the computing device 50 .

데이터 서버(70)는 특정 약물에 관한 약물 반응성 네트워크에 관한 정보를 컴퓨팅 장치(70)에게 제공할 수 있다. 상기 약물 반응성 네트워크에 관한 정보는 상기 암세포주들의 노미널 네트워크 중 상기 약물에 반응하는 서브 네트워크 부분의 노드들 및 링크들의 상호 연결구조에 관한 구성을 포함할 수 있다. 또한, 데이터 서버(70)는 상기 특정 약물을 상기 암세포주들에 투여하였을 때에, 상기 특정 약물에 의해 영향을 받는 노드에 관한 정보를 컴퓨팅 장치(70)에게 제공할 수 있다. The data server 70 may provide the computing device 70 with information about a drug responsiveness network related to a specific drug. The information on the drug responsive network may include a configuration of an interconnection structure of nodes and links of a subnetwork portion that responds to the drug among the nominal networks of the cancer cell lines. In addition, when the specific drug is administered to the cancer cell lines, the data server 70 may provide information about nodes affected by the specific drug to the computing device 70 .

상기 컴퓨팅 장치(50)는 처리부(51), 저장부(52), 및 사용자 인터페이스(53)를 포함할 수 있다. The computing device 50 may include a processing unit 51 , a storage unit 52 , and a user interface 53 .

사용자 인터페이스(53)는 상기 특정 약물을 나타내는 정보 및 상기 암세포주들을 나타내는 정보를 사용자로부터 입력받을 수 있다.The user interface 53 may receive information indicating the specific drug and information indicating the cancer cell lines from a user.

컴퓨팅 장치(50)는 상기 입력된 상기 암세포주들을 나타내는 정보 및 상기 특정 약물을 나타내는 정보를 세포주 실험장치(60)에게 전송하여, 상기 특정 약물을 상기 암세포주들에 투여한 이후 상기 암세포주들의 사멸비율에 대한 관측값들을 세포주 실험장치(60)에게 요청할 수 있다. 세포주 실험장치(60)로부터 획득된 상기 관측된 사멸비율을 상기 컴퓨팅 장치(50)의 저장부(52)에 저장될 수 있다. The computing device 50 transmits the input information indicating the cancer cell lines and the information indicating the specific drug to the cell line experiment device 60, and after administering the specific drug to the cancer cell lines, the cancer cell lines die. Observation values for the ratio may be requested from the cell line experiment device 60 . The observed death rate obtained from the cell line experiment apparatus 60 may be stored in the storage unit 52 of the computing device 50 .

상기 처리부(51)는, 상기 특정 약물의 약물 반응성 네트워크의 가중치를 결정하기 위한 복수 회의 학습스텝들을 수행하는 단계를 실행하도록 되어 있다. 이 예는도 3 내지 도 7에 설명하였다. The processing unit 51 is configured to execute a step of performing a plurality of learning steps for determining the weight of the drug responsiveness network of the specific drug. This example is illustrated in Figures 3-7.

<에이전트의 동작 원리><Operation principle of the agent>

이하 에이전트(20)의 동작 원리를 설명한다.The operating principle of the agent 20 will be described below.

현재 학습스텝에서 네트워크(500, 520, 또는 521)의 링크들에 할당되어 있는 가중치들인 현재 가중치들과 노드 특성값들을 그래프 신경망에게 입력하고, 이와 현재 가중치에 대하여 얻은 상기 리워드 값을 이어 RNN(Recurrent Neural network)의 입력으로 사용한다. RNN은 현재 학습스텝에서의 입력값과(가중치, 리워드), 이전 학습스텝의 정보를 담은 은닉상태(hidden state)를 종합하여 액션을 출력할 수 있다. 상기 액션은 다음 학습스텝에서 이용되는 갱신 가중치일 수 있다. 상기 갱신 가중치는 다음 학습스텝에서 네트워크(500, 520, 또는 521)의 링크들에 할당되는 가중치이다.In the current learning step, the current weights and node characteristic values, which are the weights assigned to the links of the network 500, 520, or 521, are input to the graph neural network, and the reward value obtained for the current weight is connected to the RNN (Recurrent It is used as an input for a neural network). The RNN can output an action by combining the input values (weight, reward) in the current learning step and the hidden state containing information in the previous learning step. The action may be an update weight used in the next learning step. The update weight is a weight assigned to links of the network 500, 520, or 521 in the next learning step.

에이전트(20)는 입력층(input layer), 서브모듈층(submodule layer), 및 메인층(main layer)의 3부분을 포함하여 이루어질 수 있다.The agent 20 may include three parts: an input layer, a submodule layer, and a main layer.

상기 입력층은 그래프를 임베딩하기 위한 그래프 모듈과 에이전트 간 통신을 위한 메시지 모듈을 포함하여 이루어질 수 있다. 상기 그래프 모듈에는 동일한 구조의 그래프 신경망이 두 개 있는데 하나(G)는 전역상태추정기(global state estimator)와 메인층, 다른 하나(G^C)는 콘텍스트 추정기(context estimator)를 위한 모듈이다. 메시지 모듈(G^m)은 상기 서브모듈층과 상기 메인층의 모든 모듈에 사용될 수 있다. 이 층의 모든 모듈은 그래프 신경망 구조를 가질 수 있다. 그래프 모듈은 매 학습스텝마다 노드 피쳐(feature)(8개 centrality measure)와 링크 피쳐(가중치, edge betweenness centrality)를 입력받아 노드, 링크, 및 글로벌 피쳐를 출력할 수 있다. 메시지 모듈은 각 에피소드가 시작할 때 0(영)벡터 입력을 받고, 이후 학습스텝에서는 이전 계산 값을 재귀적으로 받을 수 있다. 각 링크를 기준으로 소스의 노드 피쳐들, 타겟 노드, 링크 피쳐, 전역 상태를 연결(concatenate)하여 세 개 모듈의 출력(output)을 재구성할 수 있고, 이를 뒤이을 모듈에 입력될 상태로 사용할 수 있다. 세 개 모듈(G, G^C, G^m)에서 각각 재구성된 i번째 링크의 상태를 수식화하면 수식 1과 같다. The input layer may include a graph module for embedding a graph and a message module for communication between agents. The graph module has two graph neural networks having the same structure. One (G) is a module for a global state estimator and a main layer, and the other ( ^GC ) is a module for a context estimator. The message module (G ^m ) may be used in all modules of the sub-module layer and the main layer. All modules in this layer can have a graph neural network structure. The graph module may receive node features (eight centrality measures) and link features (edge betweenness centrality) at each learning step and output nodes, links, and global features. The message module receives the 0 (zero) vector input at the beginning of each episode, and can recursively receive the previous calculation value in the subsequent learning step. Based on each link, the output of the three modules can be reconstructed by concatenating the node features of the source, the target node, the link feature, and the global state, and can be used as an input state for the next module. there is. The state of the i-th link reconstructed in each of the three modules (G, G ^C , and G ^m ) is expressed as Equation 1.

[수식 1][Formula 1]

L_i=(n_{i_s}, n_{i_t}, l_i, g), L _i =(n _{i_s} , n _{i_t} , l _i , g),

L_i ^C= (n_{i_s} ^C, n_{i_t} ^C, l_i ^C, g^C), L _i ^C = (n _{i_s} ^C , n _{i_t} ^C , l _i ^C , g ^C ),

L_i ^m=(n_{i_s} ^m, n_{i_t} ^m, l_i ^m, g^m )L _i ^m =(N _{i_s} ^m , N _{i_t} ^m , L _i ^m , g ^m )

서브모듈, 메인층의 모듈들은 이와 같이 재구성된 상태를 입력 받을 수 있다. Submodules and modules of the main layer can receive the reconfigured state as input.

상기 서브모듈층은 상기 콘텍스트 추정기 모듈과 전역 상태 추정기 모듈을 포함하여 이루어질 수 있다. 상기 두 개의 모듈은 1 레이어 LSTM이며 개별 코오디니트 입력(coordinate input)에 대해 가중치(weight)를 공유하지만 독립적인 히든 상태(hidden state)를 유지할 수 있다. 콘텍스트 추정기는 환경에 대한 정보를 추정하는 모듈로 L^C, L^m과 이전 학습스텝의 리워드 값을 LSTM에 입력하고, 뒤이어 엘루 액티베이션(elu activation)을 적용한 2개 덴스 층(dense layer)를 지나 환경에 대한 정보를 출력할 수 있다. 전역 상태 추정기는 에이전트 간 통신 프로토콜을 학습하기 위한 모듈로 L, L^m, 현재 학습스텝의 액션(action of current step)을 입력받아 다음 스텝의 상태(다음 스텝의 L)를 출력할 수 있다. The submodule layer may include the context estimator module and the global state estimator module. The two modules are one-layer LSTMs and share weights for individual coordinate inputs, but can maintain independent hidden states. The context estimator is a module that estimates information about the environment. L ^C , L ^m and the reward value of the previous learning step are input into the LSTM, followed by two dense layers to which elu activation is applied. information can be output. The global state estimator is a module for learning the inter-agent communication protocol, and can receive inputs of L, L ^m , and the action of the current step and output the state of the next step (the L of the next step).

상기 메인층에서는 L, L^m, 이전 학습스텝의 리워드 값, 두 개의 서브모듈의 출력을 2 레이어 LSTM에 입력하고, 엘루 액티베이션을 적용한 1개 덴스 층을 지나 액션을 출력할 수 있다. In the main layer, L, L ^m , the reward value of the previous learning step, and the outputs of the two submodules are input to the 2-layer LSTM, and an action can be output through one dense layer to which Elu activation is applied.

<본 발명을 이용한 비즈니스 모델><Business model using the present invention>

도 16a는 본 발명을 이용한 비즈니스 모델의 일 예를 나타낸 것이다.16A shows an example of a business model using the present invention.

기술 공급자는 도 14에 제시한 기술로 학습을 완료한 에이전트(20)를 컴퓨팅 장치(710)에 제공할 수 있다. 그리고 컴퓨팅 장치(710)는 가중치가 결정된 복수 개의 약물 반응성 네트워크들의 구조 및 파라미터(가중치) 정보를 출력할 수 있다. 이 정보는 기술 수요자에게 전달될 수 있다. 상기 에이전트를 학습시키는 작업 역시 상기 컴퓨팅 장치(710)에서 실행될 수도 있다.The technology provider may provide the computing device 710 with the agent 20 that has completed learning with the technology shown in FIG. 14 . In addition, the computing device 710 may output structure and parameter (weight) information of the plurality of drug responsive networks whose weights are determined. This information can be passed on to technology consumers. A task of learning the agent may also be executed in the computing device 710 .

기술 수요자는 컴퓨팅 장치(810)에 상기 가중치가 결정된 복수 개의 약물 반응성 네트워크들의 구조 및 파라미터(가중치) 정보를 입력하고, 그리고 제1암세포주의 변이정보 및 K개의 약물 후보군 정보를 입력할 수 있다(도 13 참고). 컴퓨팅 장치(810)는 M개의 선택된 약물 정보를 약물 반응 스크리닝 장치(600)에 제공할 수 있다. 약물 반응 스크리닝 장치(600)는 의료인 또는 신약 개발자가 필요로 하는 M개의 선택된 약물의 in vitro 실험 결과를 출력할 수 있다. The technology consumer may input structure and parameter (weight) information of the plurality of drug responsive networks for which the weight is determined into the computing device 810, and input mutation information of the first cancer cell line and K drug candidate group information (Fig. 13). The computing device 810 may provide M pieces of selected drug information to the drug response screening device 600 . The drug reaction screening device 600 may output in vitro test results of M selected drugs required by medical personnel or new drug developers.

도 16b는 본 발명을 이용한 비즈니스 모델의 다른 예를 나타낸 것이다.16B shows another example of a business model using the present invention.

기술 공급자는 도 14에 제시한 기술로 학습을 완료한 에이전트(20)를 기술 수요자에게 제공할 수 있다. The technology provider may provide the agent 20 that has completed learning with the technology shown in FIG. 14 to the technology consumer.

기술 수요자는 컴퓨팅 장치(710)에 상기 제공받은 에이전트(20)를 설치할 수 있다. 상기 컴퓨팅 장치(710)는 가중치가 결정된 복수 개의 약물 반응성 네트워크들의 구조 및 파라미터(가중치) 정보를 출력할 수 있다. 컴퓨팅 장치(810)에 상기 가중치가 결정된 복수 개의 약물 반응성 네트워크들의 구조 및 파라미터(가중치) 정보를 입력하고, 그리고 제1암세포주의 변이정보 및 K개의 약물 후보군 정보를 입력할 수 있다(도 13 참고). 컴퓨팅 장치(810)는 M개의 선택된 약물 정보를 약물 반응 스크리닝 장치(600)에 제공할 수 있다. 약물 반응 스크리닝 장치(600)는 의료인 또는 신약 개발자가 필요로 하는 M개의 선택된 약물의 in vitro 실험 결과를 출력할 수 있다.A technology consumer may install the provided agent 20 in the computing device 710 . The computing device 710 may output structure and parameter (weight) information of the plurality of drug responsive networks whose weights are determined. The computing device 810 may input structure and parameter (weight) information of the plurality of drug responsiveness networks whose weights are determined, and also input mutation information of the first cancer cell line and K drug candidate information (see FIG. 13 ). . The computing device 810 may provide M pieces of selected drug information to the drug response screening device 600 . The drug reaction screening device 600 may output in vitro test results of M selected drugs required by medical personnel or new drug developers.

도 16a 및 도 16b에서, 상기 M개의 선택된 약물의 in vitro 실험 결과는 의사에 의해 활용될 수 있다. 이때, 상기 컴퓨팅 장치(810)에 입력되는 K개의 약물 후보군 정보에 대응하는 K개의 약물은 FDA 승인 약물들로서, 의사가 바로 활용할 수 있는 시판 가능 약물 집합일 수 있다. In FIGS. 16A and 16B , the in vitro test results of the M selected drugs can be used by doctors. In this case, the K drugs corresponding to the K drug candidate group information input to the computing device 810 are FDA-approved drugs, and may be a set of marketable drugs that can be immediately utilized by a doctor.

이와 달리, 상기 M개의 선택된 약물의 in vitro 실험 결과는 신약 개발자에 의해 활용될 수 있다. 상기 컴퓨팅 장치(810)에 입력되는 K개의 약물 후보군 정보에 대응하는 K개의 약물은 약물로 개발되기 이전 단계의 약효를 지니는 물질 집합일 수 있다. Alternatively, the in vitro test results of the M selected drugs can be utilized by new drug developers. The K drugs corresponding to the K drug candidate group information input to the computing device 810 may be a set of substances having pharmacological effects prior to drug development.

상술한 본 발명의 실시예들을 이용하여, 본 발명의 기술 분야에 속하는 자들은 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에 다양한 변경 및 수정을 용이하게 실시할 수 있을 것이다. 특허청구범위의 각 청구항의 내용은 본 명세서를 통해 이해할 수 있는 범위 내에서 인용관계가 없는 다른 청구항에 결합될 수 있다.Using the above-described embodiments of the present invention, those belonging to the technical field of the present invention will be able to easily implement various changes and modifications without departing from the essential characteristics of the present invention. The content of each claim of the claims may be combined with other claims without reference relationship within the scope understandable through this specification.

<사사><Sasa>

본 발명은 아래의 연구과제의 결과입니다.This invention is the result of the following research project.

*과제고유번호: SRFC-IT1802-02* Assignment identification number: SRFC-IT1802-02

*연구과제명: (G01210368)범용 암세포 시뮬레이터(2021년도)*Research project title: (G01210368) Universal cancer cell simulator (year 2021)

*계정번호: G01210368*Account number: G01210368

*계정책임자: 조광현*Account manager: Cho Kwang-hyeon

*주관기관: 한국과학기술원* Host organization: Korea Advanced Institute of Science and Technology

*연구관리전문기관: 삼성전자(주)* Research management institution: Samsung Electronics Co., Ltd.

*연구자번호: 10103248*Researcher number: 10103248

*연구기간: 2021-07-01~2021-11-30*Research Period: 2021-07-01 ~ 2021-11-30

*기여율: 100*Contribution rate: 100

Claims

generating, by the simulation device 810, a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line in each of the plurality of drug response networks for the plurality of drugs;
selecting, by the simulation device, a plurality of candidate drugs from among the plurality of drugs based on a plurality of death probabilities of the first cancer cell line output from the plurality of specific perturbation networks;
providing information on the determined plurality of candidate drugs to a drug response screening device;
executing, by the drug response screening device, an in-vitro experiment in which the plurality of candidate drugs are administered to a plurality of wells in which the first cancer cell line is stored;
capturing and analyzing, by the drug response screening device, images of the first cancer cell line in the plurality of wells using a cell image capturing device; and
outputting, by the drug response screening device, in vitro test results for at least some of the plurality of candidate drugs based on the analyzed results;
including,
Methods for determining drug candidates for cancer treatment.

According to claim 1,
Prior to the generating step, further comprising, by the computing device 710, performing a process (=episode) of determining a weight of a k-th drug responsiveness network that responds to a k-th drug among the plurality of drug responsiveness networks. and
The step of performing the process is to use an agent whose learning has been completed by reinforcement learning,
To carry out the above process,
obtaining, by the computing device, mutation information of N (=p _k ) cell lines having information on reactivity by an in vitro experiment using a kth drug among the plurality of drugs;
generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug;
repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent;
selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and
determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network;
including,
Methods for determining drug candidates for cancer treatment.

According to claim 2,
The agent is configured to determine weights of links of the kth drug responsiveness network in a next learning step based on the reward and weights of links of the kth drug responsiveness network in a current learning step.
Methods for determining drug candidates for cancer treatment.

According to claim 3,
The process of determining the reward,
The computing device, in the current learning step among the plurality of learning steps, a vector Y consisting of N death probabilities output by the N specific perturbation networks, and the k th drug to the first cancer cell line preparing a vector Z composed of values related to the death rate of the N first cancer cell lines observed by in vitro experiments;
calculating, by the computing device, a first value that is inversely proportional to a distance between the vector Y and the vector Z; and
calculating, by the computing device, the reward based on a difference between the first value and the second value;
Including,
The second value is a value inversely proportional to the distance between the vector Y and the vector Z prepared in the learning step immediately before the current learning step,
Methods for determining drug candidates for cancer treatment.

According to claim 2,
Prior to performing the process (=episode) of determining the weights of the kth drug responsiveness network, the computing device 710 further comprising the step of training the agent,
In the step of learning the agent, the process of learning the agent (=episode) is repeatedly performed for G different drugs,
The learning process of the agent performed for the g-th drug,
obtaining, by the computing device 710, mutation information of p _g cell lines having information on reactivity by an in vitro experiment using the g-th drug;
generating, _by the computing device, p g specific perturbation networks by applying the p _g pieces of the mutation information to the p th drug responsiveness network that responds to the p th drug;
repeating, by the computing device, a learning step of updating weights of links of the g-th drug responsiveness network a plurality of times using the agent; and
learning, by the computing device, the agent using the reward values and the weights acquired in a process of repeating the learning step a plurality of times;
including,
Methods for determining drug candidates for cancer treatment.

According to claim 5,
wherein the agent is configured to determine weights of links of the gth drug responsiveness network in a next learning step based on the reward and weights of links of the gth drug responsiveness network in a current learning step;
Methods for determining drug candidates for cancer treatment.

performing, by the computing device 710, a process (=episode) of determining a weight of a kth drug responsiveness network that responds to a kth drug among the plurality of drug responsiveness networks;
generating, by the simulation device 810, a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line in each of the plurality of drug response networks for the plurality of drugs; and
selecting, by the simulation device, a plurality of candidate drugs from among the plurality of drugs based on a plurality of death probabilities of the first cancer cell line output from the plurality of specific perturbation networks;
Including,
The step of performing the process is to use an agent whose learning has been completed by reinforcement learning,
To carry out the above process,
obtaining, by the computing device, mutation information of N (=p _k ) cell lines having information on reactivity by an in vitro experiment using a kth drug among the plurality of drugs;
generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug;
repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent;
selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and
determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network;
including,
Methods for determining drug candidates for cancer treatment.

simulation device 810; and
Drug response screening device (600)
Including,
The simulation device,
Generate a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line in each of the plurality of drug response networks for the plurality of drugs;
Selecting a plurality of candidate drugs from among the plurality of drugs based on the plurality of death probabilities of the first cancer cell line output by the plurality of specific perturbation networks, and
To provide information on the determined plurality of candidate drugs to a drug response screening device,
The drug response screening device,
Executing an in-vitro experiment in which the plurality of drug candidates are administered to a plurality of wells in which the first cancer cell line is stored;
The drug response screening device captures and analyzes images of the first cancer cell line in the plurality of wells using a cell image capturing device, and
wherein the drug reaction screening device outputs in vitro test results for at least some of the plurality of candidate drugs based on the analysis results;
Cancer treatment candidate drug decision system.

According to claim 8,
Computing device 710; further comprising,
The computing device, before the simulation device generates the plurality of specific perturbation networks, a process of determining a weight of a kth drug responsiveness network that responds to a kth drug among the plurality of drug responsiveness networks (= episode) is to be performed,
The process of performing the above process is to use an agent whose learning has been completed by reinforcement learning,
The process of performing the above process is,
obtaining, by the computing device, mutation information of N (=p _k ) cell lines having information on reactivity by an in vitro experiment using a kth drug among the plurality of drugs;
generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug;
repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent;
selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and
determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network;
including,
Cancer treatment candidate drug decision system.

According to claim 9,
The agent is configured to determine weights of links of the kth drug responsiveness network in a next learning step based on the reward and weights of links of the kth drug responsiveness network in a current learning step.
Cancer treatment candidate drug decision system.

According to claim 10,
The process of determining the reward,
The computing device, in the current learning step among the plurality of learning steps, a vector Y consisting of N death probabilities output by the N specific perturbation networks, and the k th drug to the first cancer cell line preparing a vector Z composed of values related to the death rate of the N first cancer cell lines observed by in vitro experiments;
calculating, by the computing device, a first value that is inversely proportional to the distance between the vector Y and the vector Z; and
calculating, by the computing device, the reward based on a difference between the first value and the second value;
Including,
The second value is a value inversely proportional to the distance between the vector Y and the vector Z prepared in the learning step immediately before the current learning step,
Cancer treatment candidate drug decision system.

According to claim 9,
Prior to performing the process (=episode) of determining the weights of the kth drug responsiveness network, the computing device 710 is configured to train the agent,
In the process of learning the agent, the process of learning the agent (=episode) is repeatedly performed for G different drugs,
The learning process of the agent performed for the g-th drug,
obtaining, by the computing device 710, mutation information of p _g cell lines having information on reactivity by an in vitro experiment using the g-th drug;
generating, _by the computing device, p g specific perturbation networks by applying the p _g pieces of the mutation information to the p th drug responsiveness network that responds to the p th drug;
repeating, by the computing device, a learning step of updating weights of links of the g-th drug responsiveness network a plurality of times using the agent; and
learning, by the computing device, the agent using the reward values and the weights acquired in a process of repeating the learning step a plurality of times;
including,
Cancer treatment candidate drug decision system.

According to claim 12,
wherein the agent is configured to determine weights of links of the gth drug responsiveness network in a next learning step based on the reward and weights of links of the gth drug responsiveness network in a current learning step;
Cancer treatment candidate drug decision system.

simulation device 810;
drug response screening device 600; and
computing device 710;
Including,
The computing device is configured to perform a process (=episode) of determining weights of a kth drug responsiveness network that responds to a kth drug among the plurality of drug responsiveness networks;
The simulation device,
Generate a plurality of specific perturbation networks by reflecting the mutation information of the first cancer cell line, respectively, on a plurality of drug response networks for a plurality of drugs, and
A plurality of candidate drugs are selected from among the plurality of drugs based on a plurality of death probabilities of the first cancer cell line output by the plurality of specific perturbation networks,
The step of performing the process is to use an agent whose learning has been completed by reinforcement learning,
To carry out the above process,
obtaining, by the computing device, mutation information of N (=p _k ) cell lines having information on reactivity by an in vitro experiment using a kth drug among the plurality of drugs;
generating N specific perturbation networks by applying, by the computing device, the N pieces of mutation information to the kth drug responsiveness network that responds to the kth drug;
repeating, by the computing device, a learning step of updating weights of links of the kth drug responsiveness network a plurality of times using the agent;
selecting, by the computing device, a learning step having the largest value of a reward provided to the agent among the plurality of learning steps; and
determining, by the computing device, link weights output by the agent in the selected learning step as weights of links of the kth drug responsiveness network;
including,
Cancer treatment candidate drug decision system.