KR20230164808A

KR20230164808A - Transcriptome-based synthetic lethality prediction device, method and computer program

Info

Publication number: KR20230164808A
Application number: KR1020220064355A
Authority: KR
Inventors: 오용호; 표준희; 임원준
Original assignee: 주식회사 디파이브테라퓨틱스
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2023-12-05
Also published as: WO2023229142A1

Abstract

본 개시의 실시 예들에 따르면, 합성치사 예측 장치가, 외부의 데이터베이스로부터 유전자들에 대한 발현 변화량에 대한 데이터들을 수신하는 단계; 특징 선별부를 이용하여 합성치사 관계를 예측하기에 적합한 입력 값의 제1 개수를 결정하는 단계; 상기 유전자들에 대한 발현 변화량 중에서, 상기 제1 개수만큼의 입력 값 셋트를 획득하는 단계; 상기 제1 개수의 유전자의 발현 변화량들의 셋트로 제1 내지 제8 합성치사 예측부 중 적어도 하나에 입력하여, 상기 제1 내지 제8 합성치사 예측부를 학습시켜 생성하는 단계; 및 상기 유전자들에 대한 발현 변화량들을 상기 제1 내지 제8 합성치사 예측부 중 적어도 하나로 입력하여, 상기 타겟 유전자와 새로운 합성 치사 관계의 하나 이상의 유전자를 출력하는 단계;를 포함하는, 합성치사 예측 방법을 개시한다.According to embodiments of the present disclosure, a synthetic lethality prediction device includes the steps of: receiving data on expression changes for genes from an external database; determining a first number of input values suitable for predicting a synthetic lethal relationship using a feature selection unit; Obtaining a set of input values corresponding to the first number of expression changes for the genes; Inputting a set of expression changes of the first number of genes into at least one of first to eighth synthetic lethality prediction units, and learning and generating the first to eighth synthetic lethality prediction units; And inputting the expression changes for the genes into at least one of the first to eighth synthetic lethal prediction units, and outputting one or more genes in a new synthetic lethal relationship with the target gene. A method for predicting synthetic lethality, comprising: commences.

Description

Transcriptome data-based synthetic lethality prediction device, synthetic lethality prediction method, and computer program {Transcriptome-based synthetic lethality prediction device, method and computer program}

본 개시의 실시예에 따르면, 전사체 데이터 기반의 합성치사 예측 장치, 합성치사 예측 방법 및 컴퓨터 프로그램에 관한 것으로, 보다 구체적으로 세포주들에 대한 유전자들의 발현 변화량 등의 데이터를 기초로 학습된 특징 선별부, 및 합성치사 예측부에 유전자들에 대한 데이터를 입력하여 합성치사 관계 여부를 출력하는 점을 특징으로 한다. According to an embodiment of the present disclosure, it relates to a synthetic lethality prediction device based on transcriptome data, a synthetic lethality prediction method, and a computer program, and more specifically, selection of learned features based on data such as expression changes of genes for cell lines. It is characterized by inputting data about genes into the negative and synthetic lethality prediction parts and outputting whether there is a synthetic lethality relationship.

암은 사망의 주요 원인이다. 암의 치료법 개발을 위한 신규한 접근법은 합성 치사(synthetic lethality)의 개념에 관한 것이다. 암세포 생존에 중요한 상보적 기능을 하는 두개의 유전자 중 하나에만 돌연변이가 있는 경우, 암세포가 생존할 수 있지만, 두 개의 유전자 모두 돌연변이가 있는 경우 암세포가 사멸에 이르게 되는 것을 합성치사라고 한다. 다시 말해, 합성치사는 돌연변이 및 약물이 함께 작용하여 두 유전자의 기능을 저해시켜 암세포의 사멸을 일으키는 상황(-돌연변이 또는 약물 중 어느 하나는 암세포의 사멸을 야기하지 않는다)에 대해 설명해준다. 암-관련 돌연변이에 합성 치사 관계의 유전자(또는 유전자 산물)를 타겟팅하면 암세포만을 사멸시키고 정상적인 세포는 살아남게 된다. 따라서, 합성치사는 항-암 특정 제제의 개발을 위한 프레임워크를 제공한다. Cancer is a leading cause of death. A novel approach for developing treatments for cancer concerns the concept of synthetic lethality. If only one of the two genes that play complementary functions important for cancer cell survival is mutated, the cancer cell can survive, but if both genes are mutated, the cancer cell dies, which is called synthetic lethality. In other words, synthetic lethality describes a situation in which a mutation and a drug work together to inhibit the function of two genes, causing the death of cancer cells (-neither the mutation nor the drug causes the death of cancer cells). Targeting a synthetic lethal gene (or gene product) to a cancer-related mutation kills only cancer cells and allows normal cells to survive. Therefore, synthetic lethality provides a framework for the development of anti-cancer specific agents.

따라서, 타겟 유전자와 관련된 항암 치료법을 개발하기 위해서, 타겟 유전자와 합성치사의 관계인 신규한 유전자를 발굴할 필요성이 대두되고 있다. Therefore, in order to develop anticancer treatments related to target genes, there is a need to discover new genes that are related to synthetic lethality with the target gene.

본 명세서에서 개시되는 실시예들은, 기존에 발견된 합성치사 이외의 신규한 합성치사를 검출하는 합성치사 예측 장치 및 방법을 제시하는데 목적이 있다. The purpose of the embodiments disclosed in this specification is to present a synthetic lethality prediction device and method for detecting new synthetic lethality in addition to the previously discovered synthetic lethality.

본 명세서에서 개시되는 실시예들은, 신규한 합성치사를 이용하여 신약 개발을 위한 표적 및 바이오마커를 발굴하는데 이용되도록 하는 합성치사 예측 장치 및 방법을 제시하는데 목적이 있다.The purpose of the embodiments disclosed herein is to present a synthetic lethality prediction device and method that can be used to discover targets and biomarkers for new drug development using novel synthetic lethality.

본 명세서에서 개시되는 실시예들은, 각각의 유전자 별로 신규한 합성치사를 검출하여, 이러한 신규한 합성치사를 이용하여 신약 개발의 타겟을 발굴하고, 바이오마커 기반의 임상 개발을 통한 신약개발의 성공확률을 높이도록 하는 합성치사 예측 장치 및 방법을 제시하는데 목적이 있다.Embodiments disclosed in this specification detect new synthetic lethality for each gene, use this novel synthetic lethality to discover targets for new drug development, and increase the probability of success in new drug development through biomarker-based clinical development. The purpose is to present a synthetic fatality prediction device and method to increase .

본 개시의 실시예들에 따른 방법은, 합성치사 예측 장치가, 외부의 데이터베이스로부터 유전자들에 대한 발현 변화량에 대한 데이터들을 수신하는 단계; 특징 선별부를 이용하여 합성치사 관계를 예측하기에 적합한 입력 값의 제1 개수를 결정하는 단계; 상기 유전자들에 대한 발현 변화량 중에서, 상기 제1 개수만큼의 입력 값 셋트를 획득하는 단계; 상기 제1 개수의 유전자의 발현 변화량들의 셋트로 제1 내지 제8 합성치사 예측부 중 적어도 하나에 입력하여, 상기 제1 내지 제8 합성치사 예측부를 학습시켜 생성하는 단계; 및 상기 유전자들에 대한 발현 변화량들을 상기 제1 내지 제8 합성치사 예측부 중 적어도 하나로 입력하여, 상기 타겟 유전자와 새로운 합성 치사 관계의 하나 이상의 유전자를 출력하는 단계;를 포함할 수 있다. The method according to embodiments of the present disclosure includes the steps of: a synthetic lethality prediction device receiving data on expression changes in genes from an external database; determining a first number of input values suitable for predicting a synthetic lethal relationship using a feature selection unit; Obtaining a set of input values corresponding to the first number of expression changes for the genes; Inputting a set of expression changes of the first number of genes into at least one of first to eighth synthetic lethality prediction units, and learning and generating the first to eighth synthetic lethality prediction units; and inputting the expression changes for the genes into at least one of the first to eighth synthetic lethal prediction units, and outputting one or more genes in a new synthetic lethal relationship with the target gene.

상기 제1 내지 제8 합성치사 예측부는, 제1 내지 제8 세포주 중 하나의 세포주에서의 유전자들로 훈련된 복수의 예측 모델들을 각각 포함하고, 복수의 예측 모델들로부터의 출력값들을 조합하여 최종 출력 값을 출력할 수 있다. The first to eighth synthetic lethal prediction units each include a plurality of prediction models trained with genes from one of the first to eighth cell lines, and output values from the plurality of prediction models are combined to produce a final output. The value can be output.

상기 특징 선별부는, 제1 내지 제8 세포주 중 하나의 세포주에서의 복수의 유전자들의 발현 변화량을 입력하여 합성치사 관계를 예측하기에 적합한 입력 값의 개수를 출력값으로 출력할 수 있다. The feature selection unit may input the expression change amount of a plurality of genes in one of the first to eighth cell lines and output the number of input values suitable for predicting a synthetic lethal relationship as an output value.

상기 제1 내지 제8 합성치사 예측부에 포함된 복수의 예측 모델들은, 기 저장된 합성치사와 관련성이 높은 유전자들 중 하나를 배제시킨 나머지 유전자들의 발현 변화량을 입력값으로 학습되는 모델일 수 있다. The plurality of prediction models included in the first to eighth synthetic lethality prediction units may be models that are learned using expression changes in the remaining genes after excluding one of the genes highly related to synthetic lethality stored as input values.

상기 제1 내지 제8 합성치사 예측부에 포함된 복수의 예측 모델들은, 상기 합성치사와 관련성이 높은 유전자들의 수 만큼 생성될 수 있다. A plurality of prediction models included in the first to eighth synthetic lethal prediction units may be generated as many as the number of genes highly related to the synthetic lethality.

상기 특징 선별부는, 세포주에 포함된 합성치사 관계의 유전자의 수와, 합성치사 관계가 아닌 유전자의 수를 비교한 결과에 따라서 언더 샘플링 또는 오버 샘플링의 방법 중 하나의 방법으로 세포주의 유전자들의 발현 변화량에 대한 데이터를 처리하여 학습될 수 있다. The feature selection unit determines the expression change amount of genes in the cell line by one of undersampling or oversampling according to the result of comparing the number of genes with synthetic lethality and the number of genes with non-synthetic lethality contained in the cell line. It can be learned by processing data about .

본 개시의 실시예들에 따른 합성치사 예측 장치는 프로세서, 통신부, 및 메모리를 포함하고, 상기 프로세서가 상기 메모리에 저장된 명령어들을 실행시켜, The synthetic fatality prediction device according to embodiments of the present disclosure includes a processor, a communication unit, and a memory, and the processor executes instructions stored in the memory,

외부의 데이터베이스로부터 유전자들에 대한 발현 변환량에 대한 데이터들을 수신하고, 특징 선별부를 이용하여 합성치사 관계를 예측하기에 적합한 입력 값의 제1 개수를 결정하고, 상기 타겟 유전자와 나머지 유전자의 발현 값들 중에서, 상기 제1 개수만큼의 입력 값 셋트를 획득하고, 상기 제1 개수의 유전자의 발현 변화량들의 셋트로 제1 내지 제8 합성치사 예측부 중 적어도 하나에 입력하여, 상기 제1 내지 제8 합성치사 예측부들을 학습시켜 생성하며, 상기 유전자들에 대한 발현 변화량들을 상기 제1 내지 제8 합성치사 예측부들 중 적어도 하나로 입력하여, 상기 타겟 유전자와 새로운 합성 치사 관계의 하나 이상의 유전자를 출력할 수 있다. Receive data on expression conversion amounts for genes from an external database, determine a first number of input values suitable for predicting a synthetic lethal relationship using a feature selection unit, and determine expression values of the target gene and the remaining genes. Among them, a set of input values equal to the first number is obtained, and a set of expression changes of the first number of genes is input to at least one of the first to eighth synthetic lethality prediction units, and the first to eighth synthesis Lethal prediction units are learned and generated, and expression changes for the genes are input into at least one of the first to eighth synthetic lethal prediction units, and one or more genes in a new synthetic lethal relationship with the target gene can be output. .

본 발명의 실시 예에 따른 컴퓨터 프로그램은 컴퓨터를 이용하여 본 발명의 실시 예에 따른 방법 중 어느 하나의 방법을 실행시키기 위하여 매체에 저장될 수 있다. A computer program according to an embodiment of the present invention may be stored in a medium to execute any one of the methods according to an embodiment of the present invention using a computer.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공된다. In addition to this, another method for implementing the present invention, another system, and a computer-readable recording medium for recording a computer program for executing the method are further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해 질 것이다. Other aspects, features and advantages in addition to those described above will become apparent from the following drawings, claims and detailed description of the invention.

전술한 과제 해결 수단 중 어느 하나에 의하면, 기존에 발견된 합성치사 이외의 신규한 합성치사를 검출할 수 있다. According to any one of the means for solving the above-mentioned problem, it is possible to detect new synthetic lethality other than the previously discovered synthetic lethality.

전술한 과제 해결 수단 중 어느 하나에 의하면, 신규한 합성치사를 이용하여 신약 개발의 표적 및 바이오마커를 발굴하는데 활용될 수 있다. According to any one of the above-described means of solving the problem, novel synthetic lethality can be used to discover targets and biomarkers for new drug development.

전술한 과제 해결 수단 중 어느 하나에 의하면, 각각의 유전자 별로 신규한 합성치사를 검출하여, 이러한 신규한 합성치사를 이용하여 신약 개발을 위한 타겟을 발굴하고 바이오마커 기반의 임상 개발을 통한 신약개발의 성공확률을 높일 수 있다. According to one of the above-mentioned problem solving methods, novel synthetic lethality is detected for each gene, these novel synthetic lethality is used to discover targets for new drug development, and new drug development through biomarker-based clinical development is performed. You can increase your chances of success.

도 1은, 본 개시의 실시 예들에 따른 합성치사 예측 장치(100)의 블록도이다.
도 2는 본 개시의 일 실시 예에 따른 서버 및 사용자 단말을 포함하는 합성치사 예측 네트워크 시스템을 도시한다.
도 3은 본 개시의 일 실시 예에 따른 특징 선별부(120)와 합성치사 예측부(130)의 상세 블록도이다.
도 4는 제1 서브 합성치사 예측부(131)의 블록도이다.
도 5는 특징 선별부를 학습시키는 특징 선별 학습 장치(200)의 블록도이다.
도 6a는 본 개시의 일 실시예에 따른 세포주 A549에 대한 모델의 평가표의 예시 도면이다.
도 6b는 본 개시의 일 실시예에 따른 세포주 SLDB에 대한 모델의 평가표의 예시 도면이다.
도 7은 특징 선별부의 피쳐 개수에 따른 F1 score 값들의 그래프이다.
도 8은 다른 실시 예에 따른 특징 선별부의 피쳐 개수에 따른 F1 score 값들의 그래프이다.
도 9는 합성치사 예측부를 학습시키는 예측부 학습 장치(300)의 블록도이다.
도 10은 본 개시의 실시 예들에 따른 합성치사 예측 장치(100)를 포함하는 네트워크 환경에 대한 예시 도면이다.
도 11는 본 개시의 실시 예들에 따른 합성치사 예측 방법의 흐름도이다. Figure 1 is a block diagram of a synthetic fatality prediction device 100 according to embodiments of the present disclosure.
Figure 2 illustrates a synthetic fatality prediction network system including a server and a user terminal according to an embodiment of the present disclosure.
Figure 3 is a detailed block diagram of the feature selection unit 120 and the synthetic fatality prediction unit 130 according to an embodiment of the present disclosure.
Figure 4 is a block diagram of the first sub synthetic fatality prediction unit 131.
Figure 5 is a block diagram of the feature selection learning device 200 that trains the feature selection unit.
Figure 6a is an example diagram of a model evaluation table for cell line A549 according to an embodiment of the present disclosure.
Figure 6b is an example diagram of an evaluation table of a model for cell line SLDB according to an embodiment of the present disclosure.
Figure 7 is a graph of F1 score values according to the number of features in the feature selection unit.
Figure 8 is a graph of F1 score values according to the number of features of the feature selection unit according to another embodiment.
Figure 9 is a block diagram of the prediction unit learning device 300 that trains the synthetic fatality prediction unit.
FIG. 10 is an example diagram of a network environment including the synthetic fatality prediction device 100 according to embodiments of the present disclosure.
Figure 11 is a flowchart of a synthetic fatality prediction method according to embodiments of the present disclosure.

이하 첨부된 도면들에 도시된 본 발명에 관한 실시 예를 참조하여 본 발명의 구성 및 작용을 상세히 설명한다.Hereinafter, the configuration and operation of the present invention will be described in detail with reference to embodiments of the present invention shown in the attached drawings.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. Since the present invention can be modified in various ways and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. The effects and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms.

이하, 첨부된 도면을 참조하여 본 발명의 실시 예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. When describing with reference to the drawings, identical or corresponding components will be assigned the same reference numerals and redundant description thereof will be omitted. .

본 명세서에서 “학습”, “러닝” 등의 용어는 인간의 교육 활동과 같은 정신적 작용을 지칭하도록 의도된 것이 아닌 절차에 따른 컴퓨팅(computing)을 통하여 기계 학습(machine learning)을 수행함을 일컫는 용어로 해석한다.In this specification, terms such as “learning” and “learning” are not intended to refer to mental operations such as human educational activities, but are terms that refer to performing machine learning through procedural computing. interpret.

이하의 실시 예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. In the following embodiments, terms such as first and second are used not in a limiting sense but for the purpose of distinguishing one component from another component.

이하의 실시 예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. In the following examples, singular terms include plural terms unless the context clearly dictates otherwise.

이하의 실시 예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. In the following embodiments, terms such as include or have mean the presence of features or components described in the specification, and do not preclude the possibility of adding one or more other features or components.

도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다. In the drawings, the sizes of components may be exaggerated or reduced for convenience of explanation. For example, the size and thickness of each component shown in the drawings are shown arbitrarily for convenience of explanation, so the present invention is not necessarily limited to what is shown.

어떤 실시 예가 달리 구현 가능한 경우에 특정한 공정 순서는 설명되는 순서와 다르게 수행될 수도 있다. 예를 들어, 연속하여 설명되는 두 공정이 실질적으로 동시에 수행될 수도 있고, 설명되는 순서와 반대의 순서로 진행될 수 있다.In cases where an embodiment can be implemented differently, a specific process sequence may be performed differently from the described sequence. For example, two processes described in succession may be performed substantially at the same time, or may be performed in an order opposite to that in which they are described.

이하의 본 개시에서, 서버가 사용자 단말을 제어한다는 의미는 서버가 사용자 단말과의 통신을 통해 사용자 단말에서의 출력(화면 표시, 음향 출력, 진동 출력, 램프 발광 등 사용자 단말에서의 모든 출력 장치) 및 사용자 단말이 소정의 동작을 수행하기 위한 데이터를 제공하는 것을 의미할 수 있다. 물론 사용자 단말이 기 저장된 데이터를 이용하여 서버가 사용자 단말에서의 출력을 제어할 수도 있으며, 상기 예시에 제한되지 않는다.In the present disclosure below, the meaning that the server controls the user terminal means that the server outputs output from the user terminal through communication with the user terminal (all output devices from the user terminal, such as screen display, sound output, vibration output, lamp emission, etc.) And it may mean that the user terminal provides data for performing a certain operation. Of course, the server may control the output from the user terminal using data previously stored in the user terminal, and is not limited to the above example.

이하의 본 개시에서 사용자 계정과 정보 또는 데이터를 송수신한다는 의미는 사용자 계정과 대응 또는 연동된 디바이스(또는 사용자 단말)과 정보 또는 데이터를 송수신한다는 의미를 포함할 수 있다.In the present disclosure below, the meaning of transmitting and receiving information or data with a user account may include the meaning of transmitting and receiving information or data with a device (or user terminal) corresponding to or linked to the user account.

이하의 본 개시에서 사용자 단말과 대응되는 사용자 계정이란 사용자 단말을 통해 서비스에 로그인 또는 접속한 사용자 계정 및 사용자 단말이 정보를 저장하고 있는 사용자 계정을 포함할 수 있다. 또한 사용자 계정의 사용자 단말이란 사용자 계정이 로그인 되거나, 사용자 계정 정보가 저장되거나, 사용자 계정이 접속한 사용자 단말을 의미할 수 있다.In the present disclosure below, the user account corresponding to the user terminal may include a user account that logs in or accesses the service through the user terminal and a user account in which the user terminal stores information. Additionally, the user terminal of the user account may mean the user terminal to which the user account is logged in, user account information is stored, or the user account is connected.

이하의 본 개시에서 합성치사 예측 장치는 사용자 단말일 수도 있고 서버일 수도 있고, 합성치사 예측 시스템일수도 있고, 별도의 장치일 수도 있다.In the present disclosure below, the synthetic fatality prediction device may be a user terminal, a server, a synthetic fatality prediction system, or a separate device.

이하의 본 개시에서 합성치사는 두 개의 유전자가 동시에 넉아웃(knock out), 넉다운(knockdown), 또는 약물 섭동(perturbation) 되었을 때 세포사멸이 나타나는 현상을 말한다.In the present disclosure below, synthetic lethality refers to a phenomenon in which cell death occurs when two genes are simultaneously knocked out, knocked down, or perturbed by drugs.

이하의 본 개시에서 세포주(cell line)는 생체 밖에서 계속적으로 배양이 가능한 세포 집합을 의미할 수 있다. In the present disclosure below, a cell line may refer to a set of cells that can be continuously cultured outside the body.

이하의 본 개시에 따르면, 기존에 발견된 합성치사 이외의 신규한 합성치사를 검출할 수 있다. According to the present disclosure below, it is possible to detect novel synthetic lethality other than the previously discovered synthetic lethality.

이하의 본 개시에 따르면, 신규한 합성치사를 이용하여 신약 개발을 위한 항암 타겟을 발굴하는데 이용될 수 있다. According to the present disclosure below, novel synthetic lethality can be used to discover anti-cancer targets for new drug development.

이하의 본 개시에 따르면, 각각의 유전자 별로 신규한 합성치사를 검출하여, 이러한 신규한 합성치사를 이용하여 신약 개발을 위한 타겟을 발굴하고 바이오마커 기반의 임상 개발을 통한 신약개발의 성공확률을 높일 수 있다. According to the present disclosure below, novel synthetic lethality is detected for each gene, and these new synthetic lethality are used to discover targets for new drug development and increase the probability of success in new drug development through biomarker-based clinical development. You can.

본 개시의 실시 예들에 따르면, 합성치사 예측 장치는 유전자들의 발현 변화량을 기초로 학습된 모델을 이용하여 유전자와 합성치사 관계의 유전자를 새롭게 예측할 수 있다. According to embodiments of the present disclosure, a synthetic lethality prediction device can newly predict a gene and a gene in a synthetic lethal relationship using a model learned based on expression changes of genes.

도 1은, 본 개시의 실시 예들에 따른 합성치사 예측 장치(100)의 블록도이다. Figure 1 is a block diagram of a synthetic fatality prediction device 100 according to embodiments of the present disclosure.

합성치사 예측 장치(100)는 유전자들에 대한 데이터를 입력하여 유전자들에서 새롭게 예측된 합성치사 관계의 유전자 셋트를 출력할 수 있다. The synthetic lethality prediction device 100 can input data on genes and output a set of genes with a synthetic lethality relationship newly predicted from the genes.

합성치사 예측 장치(100)는 유전자에 대한 발현 변화량에 대한 데이터를 입력하는 데이터 입력부(110)를 포함할 수 있다. The synthetic lethality prediction device 100 may include a data input unit 110 that inputs data on expression changes for genes.

합성치사 예측 장치(100)는 특징 선별부(120)를 포함하여, 새롭게 합성치사 관계의 유전자 셋트를 예측하기 위해 필요한 데이터를 출력할 수 있다. 특징 선별부(120)는, 세포주에 대한 유전자 발현량에 기초하여 학습되어 구현될 수 있다. 예를 들어, 특징 선별부(120)는 A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3 총 8개의 암 세포주를 이용하여 학습되어 8개의 특징 선별부들을 포함할 수 있다. 8개의 암 세포주에 한정되지 않고 다양한 세포주에 기초하여 학습되어 구현될 수 있다. The synthetic lethality prediction device 100 includes a feature selection unit 120 and can output data required to predict a new gene set with a synthetic lethality relationship. The feature selection unit 120 may be implemented by learning based on the gene expression level for the cell line. For example, the feature selection unit 120 may be learned using a total of eight cancer cell lines: A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, and PC3, and may include eight feature selection units. It is not limited to eight cancer cell lines and can be learned and implemented based on various cell lines.

특징 선별부(120)는 유전자들의 발현 변화량을 포함하는 데이터를 입력으로 하여 학습될 수 있다. 특징 선별부(120)는 트리 기반 피쳐 선택 모델(tree based feature selection model)을 이용하여 지도 학습될 수 있으나, 이에 한정되지 않는다. 보다 구체적으로, 특징 선별부(120)는 random forest 모델의 feature importance 기준으로 추출된 n개의 유전자를 선별할 수 있다. 여기서, n은 자연수 이다. 그 외에도, 특징 선별부(120)는 lightGBM, KNN, Logistic regression, Naive bayes, SVM 등의 방법으로 n개의 유전자를 선별할 수 있다. 최종적으로, n개, n개의 유전자 셋트가 출력될 수 있다. n개, n개의 유전자 셋트는 세포주 별로 다르게 결정될 수 있다. 특징 선별부(120)는 지도 학습 방법에 따라서, 유전자들의 발현 변화량과, 합성치사 관계의 유전자 셋트를 입력하여 학습될 수 있다. 여기서, 유전자들의 발현 변화량에 대한 데이터는 대상 세포의 유전자들에 대해서 수행된 약물 섭동, 유전자 간섭 실험 결과에 따른 데이터를 말할 수 있다. 유전자들의 발현 변화량에 대한 데이터는, 가공되지 않은 데이터를 획득하고, 데이터를 이미지로 변환하고 랜드마크 유전자에 대한 분위수 정규화 결과와 랜드마크 유전자 이외의 유전자들에 대한 추론 데이터 일 수 있으며, 어레이 형태, 또는 2X2 매트릭스 형태일 수 있다. 여기서, 분위수 정규화 결과는, 실험 결과로 생성되는 데이터에 대해서 보정 과정을 거친 후 생성되는 결과 데이터를 말한다. 실험 결과로 생성되는 데이터는, 테스트 하는 환경, 테스트 수행 기관 등에 따른 편차를 포함할 수 있다. 분위수 정규화 결과는, 이러한 편차를 보정한 결과 데이터를 말할 수 있다. The feature selection unit 120 may be trained by inputting data including expression changes of genes. The feature selection unit 120 may be supervised using a tree-based feature selection model, but is not limited to this. More specifically, the feature selection unit 120 may select n genes extracted based on the feature importance of the random forest model. Here, n is a natural number. In addition, the feature selection unit 120 can select n genes using methods such as lightGBM, KNN, Logistic regression, Naive Bayes, and SVM. Finally, n gene sets can be output. n and n gene sets may be determined differently for each cell line. The feature selection unit 120 can be learned by inputting expression changes of genes and a set of genes with a synthetic lethal relationship according to a supervised learning method. Here, data on changes in expression of genes may refer to data based on the results of drug perturbation or gene interference experiments performed on genes of target cells. Data on expression changes of genes may be obtained by acquiring raw data, converting the data into an image, quantile normalization results for landmark genes, and inferred data for genes other than landmark genes, in the form of an array, Or it may be in the form of a 2X2 matrix. Here, the quantile normalization result refers to the resulting data generated after going through a correction process for the data generated as an experiment result. Data generated as a result of an experiment may include deviations depending on the testing environment, test performance organization, etc. The quantile normalization result can refer to the data resulting from correcting this deviation.

합성치사 예측 장치(100)는 합성치사 예측부(130)를 포함하여 유전자들의 발현 변화량 셋트를 입력하여 지도 학습된 예측 모델을 통해서 유전자들이 합성치사 관계인지 여부를 판단할 수 있다. The synthetic lethality prediction device 100 includes a synthetic lethality prediction unit 130 and can determine whether the genes have a synthetic lethality relationship through a supervised learning prediction model by inputting a set of expression changes of genes.

합성치사 예측부(130)는 세포주 별로 유전자 발현 변화량 데이터를 획득하여 세포주 별로 각각 지도 학습된 결과로 구현될 수 있다.The synthetic lethality prediction unit 130 may be implemented as a result of supervised learning for each cell line by acquiring gene expression change data for each cell line.

하나의 세포주에 대해서도, 복수의 학습 모델들을 구현하여 복수의 학습 모델들로부터의 결과들을 조합하여 합성치사 관계인지 여부를 출력(return)할 수 있다. 복수의 학습 모델들로부터의 결과들은 합성치사 관계인지 여부(Y, N)를 포함할 수 있다. Even for one cell line, multiple learning models can be implemented and results from multiple learning models can be combined to output (return) whether or not there is a synthetic lethal relationship. Results from multiple learning models may include whether there is a synthetic lethal relationship (Y, N).

이때, 하나의 세포주에 대해서 학습되는 복수의 학습 모델들은, 질병과의 관련성, 합성치사 관계와의 매칭 여부 등에 따른 유전자들을 배제하여 학습된 것일 수 있다. 예를 들어, 합성치사 예측부(130)는 합성치사 관계이면서 암과 관련된 유전자들인 76개의 유전자에 대한 데이터를 획득하고 76개의 유전자 중 하나의 유전자가 배제된 학습 데이터 세트로 학습 모델을 지도학습시킬 수 있다. 76개의 유전자 각각을 배제하여 학습하기 때문에, 복수의 학습 모델들은 76개 만큼 생성될 수 있다. 합성치사 예측부(130)는 각각의 세포주에 대해서 합성치사 관계이면서 암과 관련된 유전자들 각각을 배제하여 학습된 복수의 학습 모델들을 각각 생성할 수 있다. 합성치사 예측부(130)는 제1 세포주로 학습된 모델들의 결과들 중에서 가장 많은 빈도로 발생된 결과를 제1 세포주로 학습된 모델의 제1 결과값으로 출력할 수 있다. 합성치사 예측부(130)는 제1 내지 제8 세포주로 학습된 모델들에 입력하여, 제1 내지 제8 결과값을 출력할 수 있다. At this time, a plurality of learning models learned for one cell line may be learned by excluding genes depending on the relationship with the disease, matching with the synthetic lethal relationship, etc. For example, the synthetic lethality prediction unit 130 acquires data on 76 genes that are synthetic lethal and are related to cancer, and supervised learning a learning model with a learning data set in which one gene among the 76 genes is excluded. You can. Since each of the 76 genes is excluded and learned, multiple learning models can be created as many as 76. The synthetic lethal prediction unit 130 may generate a plurality of learning models learned by excluding each of the genes related to cancer and having a synthetic lethal relationship for each cell line. The synthetic lethality prediction unit 130 may output the result that occurred with the greatest frequency among the results of the models learned with the first cell line as the first result value of the model learned with the first cell line. The synthetic lethal prediction unit 130 may input models learned using the first to eighth cell lines and output first to eighth result values.

판단부(140)는 제1 내지 제8 결과값을 입력으로 하여 유전자들 사이가 합성치사 관계인지 여부를 판단할 수 있다. The determination unit 140 may determine whether there is a synthetic lethal relationship between genes by using the first to eighth result values as input.

특징 선별부(120)에 포함된 학습 모델 및/또는 합성치사 예측부(130)에 포함된 학습 모델은 LINCS L1000 프로젝트에서 시행한 978개의 랜드마크 유전자와 596개의 DNA 수선 유전자를 합한 1574개의 유전자에 대한 데이터로 학습될 수 있다. 학습 모델에 입력되는 데이터는, 3000여개의 유전자 간섭 실험 결과들을 포함할 수 있으나 이에 한정되지 않고 다양한 변형이 가능할 수 있다. 학습 모델에 입력되는 데이터는, 랜드마크 유전자에 대한 분위수 정규화 결과와 랜드마크 유전자 이외의 유전자들에 대한 추론 데이터를 포함할 수 있다. The learning model included in the feature selection unit 120 and/or the learning model included in the synthetic lethality prediction unit 130 is based on 1574 genes, including 978 landmark genes and 596 DNA repair genes implemented in the LINCS L1000 project. It can be learned from data about The data input to the learning model may include the results of about 3,000 genetic interference experiments, but is not limited to this and various modifications may be possible. Data input to the learning model may include quantile normalization results for the landmark gene and inference data for genes other than the landmark gene.

특징 선별부(120)에 포함된 학습 모델 또는 합성치사 예측부(130)에 포함된 학습 모델은 A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3 등과 같이 알려진 세포주들에 대해서 생성되며, 세포주 별로 각각 생성될 수 있다. The learning model included in the feature selection unit 120 or the learning model included in the synthetic lethal prediction unit 130 is generated for known cell lines such as A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, etc., Each cell line can be generated separately.

예를 들어, A549 세포주와 관련된 학습 모델은, 합성치사 관계로 알려진 126개의 유전자쌍과, 합성치사 관계가 아닌 1094개의 유전자 쌍을 입력으로 하여 훈련될 수 있다. For example, a learning model related to the A549 cell line can be trained with 126 gene pairs known to be synthetic lethal and 1094 gene pairs that are not synthetic lethal as input.

나머지 7개의 A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3의 암세포주와 관련된 학습 모델은, 1164개의 합성치사 유전자 쌍, 416개의 합성치사를 일으키지 못하는 유전자 쌍을 입력으로 하여 훈련될 수 있다. The learning models related to the remaining seven cancer cell lines, A375, HA1E, HCC515, HEPG2, HT29, MCF7, and PC3, can be trained using 1,164 synthetic lethal gene pairs and 416 gene pairs that do not cause synthetic lethality as input.

각각의 학습 모델은, 언더샘플링 방법을 활용하여, 훈련될 수 있다. 즉, A549 세포주와 관련된 학습 모델은, 126개의 유전자 쌍을 정답지로 하여 훈련될 수 있다. 나머지 7개의 A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3의 암세포주와 관련된 학습 모델은, 416개의 유전자 쌍을 정답지로 하여 훈련될 수 있다. Each learning model can be trained using an undersampling method. In other words, the learning model related to the A549 cell line can be trained using 126 gene pairs as the correct answer. Learning models related to the remaining seven cancer cell lines, A375, HA1E, HCC515, HEPG2, HT29, MCF7, and PC3, can be trained using 416 gene pairs as the answer sheet.

특징 선별부(120)에 포함된 학습 모델 및/또는 합성치사 예측부(130)에 포함된 학습 모델은 lightGBM algorithm 또는 Neural Network algorithm을 활용할 수 있다. The learning model included in the feature selection unit 120 and/or the learning model included in the synthetic fatality prediction unit 130 may utilize the lightGBM algorithm or Neural Network algorithm.

본 개시의 실시 예에 따르면, 특징 선별부(120)에서 선별된 유전자에 대한 발현 변화량을 기초로 합성치사 여부를 예측할 수 있다. 이를 통해 합성치사 관계의 유전자의 수, 합성치사 관계가 아닌 유전자의 수가 차이가 나는 훈련 데이터를 통해서도 입력된 유전자들의 합성치사 관계 여부가 예측될 수 있다. According to an embodiment of the present disclosure, synthetic lethality can be predicted based on the amount of expression change for the gene selected in the feature selection unit 120. Through this, the synthetic lethal relationship of the input genes can be predicted even through training data in which the number of genes with a synthetic lethal relationship is different from the number of genes with a non-synthetic lethal relationship.

도 2는 본 개시의 일 실시 예에 따른 서버 및 사용자 단말을 포함하는 합성치사 예측 네트워크 시스템을 도시한다.Figure 2 illustrates a synthetic fatality prediction network system including a server and a user terminal according to an embodiment of the present disclosure.

본 개시의 합성치사 예측 네트워크 시스템(1)은 서버(20)와 적어도 하나의 사용자 단말(11 내지 16)을 포함할 수 있다. 서버(20)는 네트워크 망을 통해 다양한 유전자를 분석하는 서비스를 제공할 수 있다. 서버(20)는 적어도 하나의 사용자 단말(11 내지 16)에게 동시에 유전자를 분석하는 서비스를 제공할 수 있다.The synthetic fatality prediction network system 1 of the present disclosure may include a server 20 and at least one user terminal 11 to 16. The server 20 can provide services for analyzing various genes through a network. The server 20 may provide a service for analyzing genes to at least one user terminal 11 to 16 at the same time.

본 개시의 일 실시 예에 따르면, 서버(20)라 함은, 단일 서버, 서버의 집합체, 클라우드 서버 등을 포함할 수 있으며, 상기 예시에 제한되지 않는다. 서버(20)는 유전자를 분석하는 데이터, 유전자 변이에 대한 데이터, 종속 점수에 대한 데이터, 세포주들에 대한 데이터, 발현량에 대한 데이터 등을 저장하는 데이터베이스를 포함할 수 있다. 앞서 설명한 바와 같이 서버(20)는 합성치사 예측 장치일 수 있다.According to an embodiment of the present disclosure, the server 20 may include a single server, a collection of servers, a cloud server, etc., and is not limited to the above examples. The server 20 may include a database that stores data for analyzing genes, data on genetic mutations, data on dependency scores, data on cell lines, data on expression levels, etc. As previously described, the server 20 may be a synthetic fatality prediction device.

본 개시의 일 실시 예에 따르면 네트워크란 모든 통신 방식을 이용하여 설립(또는 형성)된 연결을 의미하며, 단말과 단말 간의 또는 단말과 서버 간의 데이터를 송수신하는, 모든 통신 방식을 통해 연결된 통신망을 의미할 수 있다.According to an embodiment of the present disclosure, a network refers to a connection established (or formed) using all communication methods, and refers to a communication network connected through all communication methods that transmits and receives data between terminals and terminals or between terminals and servers. can do.

모든 통신 방식이라 함은 소정의 통신 규격, 소정의 주파수 대역, 소정의 프로토콜 또는 소정의 채널을 통한 통신 등 모든 통신 방식을 포함할 수 있다. 예를 들면, 블루투스, BLE, Wi-Fi, Zigbee, 3G, LTE, 초음파를 통한 통신 방식 등을 포함할 수 있으며, 근거리 통신, 원거리 통신, 무선 통신 및 유선 통신을 모두 포함할 수 있다. 물론 상기 예시에 제한되지 않는다.All communication methods may include all communication methods, such as communication through a certain communication standard, a certain frequency band, a certain protocol, or a certain channel. For example, it may include Bluetooth, BLE, Wi-Fi, Zigbee, 3G, LTE, and ultrasonic communication methods, and may include short-range communication, long-distance communication, wireless communication, and wired communication. Of course, it is not limited to the above example.

본 개시의 일 실시 예에 따르면 근거리 통신 방식이라 함은, 통신을 수행하는 디바이스(단말 또는 서버)가 소정의 범위 내에 있을 때에만 통신이 가능한 통신 방식을 의미할 수 있으며, 예를 들어, 블루투스, NFC 등을 포함할 수 있다. 원거리 통신 방식이라 함은, 통신을 수행하는 디바이스가 거리와 관계 없이 통신이 가능한 통신 방식을 의미할 수 있다. 예를 들면, 원거리 통신 방식은 AP와 같은 중계기를 통해 통신을 수행하는 두 디바이스가 소정의 거리 이상일 때에도 통신할 수 있는 방식을 의미할 수 있으며, SMS, 전화와 같은 셀룰러 네트워크(3G, LTE)를 이용한 통신 방식을 포함할 수 있다. 물론 상기 예시에 제한되지 않는다. 네트워크 망을 이용하여 유전자 분석 서비스를 제공받는다는 의미는 모든 통신 방식을 통해 서버와 단말 간의 통신이 수행될 수 있다는 의미를 포함할 수 있다.According to an embodiment of the present disclosure, the short-distance communication method may mean a communication method that allows communication only when the device (terminal or server) performing communication is within a predetermined range, for example, Bluetooth, It may include NFC, etc. A long-distance communication method may refer to a communication method in which a device performing communication can communicate regardless of the distance. For example, a long-distance communication method may refer to a method in which two devices that communicate through a repeater such as an AP can communicate even when they are over a predetermined distance, and can use cellular networks (3G, LTE) such as SMS and phone calls. It may include the communication method used. Of course, it is not limited to the above example. Receiving a genetic analysis service using a network may mean that communication between the server and the terminal can be performed through any communication method.

명세서 전체에서 적어도 하나의 사용자 단말(11 내지 16)이라 함은 퍼스널 컴퓨터(Personal Computer)(11), 태블릿(Tablet)(12), 휴대폰(Cellular Phone)(13), 노트북(14), 스마트 폰(15), TV(16) 뿐만 아니라, PDA(Personal Digital Assistants), PMP(Portable Multimedia Player), 네비게이션, MP3 플레이어, 디지털 카메라, 냉장고, 세탁기, 청소기 등의 다양한 전자 디바이스를 포함할 수 있고, 상기 예시에 제한되지 않는다. 앞서 설명한 바와 같이 적어도 하나의 사용자 단말(11 내지 16)은 합성치사 예측 장치일 수 있다.Throughout the specification, at least one user terminal (11 to 16) refers to a personal computer (11), a tablet (12), a cellular phone (13), a laptop (14), and a smart phone. (15), In addition to the TV (16), it may include various electronic devices such as PDAs (Personal Digital Assistants), PMPs (Portable Multimedia Players), navigation, MP3 players, digital cameras, refrigerators, washing machines, and vacuum cleaners. It is not limited to examples. As described above, at least one user terminal 11 to 16 may be a synthetic fatality prediction device.

본 개시의 일 실시 예에 따르면, 유전자 분석 서비스는 타겟 유전자와 관련된 합성치사 후보 유전자군를 제공하는 서비스, 타겟 유전자와 관련된 세포주 데이터를 제공하는 서비스, 타겟 유전자와 후보 유전자 사이의 합성치사 여부를 제공하는 서비스, 타겟 유전자와 후보 유전자 사이의 유사한 정도 등을 제공하는 서비스 등을 포함할 수 있으며, 상기 예시에 제한되지 않는다. According to an embodiment of the present disclosure, the genetic analysis service includes a service providing a synthetic lethal candidate gene group related to a target gene, a service providing cell line data related to the target gene, and a service providing whether synthetic lethality exists between the target gene and the candidate gene. It may include services, services that provide the degree of similarity between target genes and candidate genes, etc., but is not limited to the examples above.

본 개시의 일 실시 예에 따르면, 서버(20)는 유전자 발현 변화량에 따른 데이터를 입력으로 하여 1쌍의 유전자가 합성치사 관계가 될 수 있는지 여부를 판단할 수 있다. 서버(20)는 세포주 별로 유전자 발현 변화량을 입력으로 하여 학습된 특징 선별부, 특징 선별부에서의 특징 개수 만큼의 유전자의 발현 변화량을 입력으로 하여 학습된 합성치사 예측부를 이용하여 유전자들의 합성치사 여부를 판단할 수 있다. 이때, 합성치사 예측부는, 정해진 유전자들 중 하나를 배제하여 학습된 복수의 예측 모델들을 포함할 수 있다. 여기서 정해진 유전자들은, 암 등의 질병과의 관련성, 합성치사 관계 여부로 결정된 것일 수 있다. 정해진 유전자들에 대한 데이터는, 미리 저장되거나 외부의 데이터베이스에 요청하여 획득할 수 있다. 복수의 예측 모델들은, 정해진 유전자들 중 하나를 번갈아 배제하여 생성된 것일 수 있다. 예를 들어, 정해진 유전자들이 10개라면, 10개를 각각 배제하여 10개의 복수의 예측 모델들이 생성될 수 있다. 서버(20)는 복수의 예측 모델들로부터의 결과 중 빈도수가 가장 많은 결과를 최종 결과로 결정할 수 있다. According to an embodiment of the present disclosure, the server 20 may determine whether a pair of genes may have a synthetic lethal relationship by inputting data according to the amount of change in gene expression. The server 20 uses a feature selection unit learned by inputting the amount of gene expression change for each cell line, and a synthetic lethality prediction unit learned by inputting the change in expression of genes equal to the number of features in the feature selection unit to determine whether genes are synthetically lethal. can be judged. At this time, the synthetic lethality prediction unit may include a plurality of prediction models learned by excluding one of the designated genes. The genes determined here may be determined by their relationship with diseases such as cancer or their relationship with synthetic lethality. Data on designated genes can be stored in advance or obtained by requesting an external database. A plurality of prediction models may be generated by alternately excluding one of the designated genes. For example, if there are 10 designated genes, 10 multiple prediction models can be created by excluding each of the 10 genes. The server 20 may determine the most frequent result among the results from the plurality of prediction models as the final result.

본 개시의 일 실시 예에 따르면, 적어도 하나의 사용자 단말(11 내지 16) 중 하나는 유전자 발현 변화량에 따른 데이터를 입력으로 하여 1쌍의 유전자가 합성치사 관계가 될 수 있는지 여부를 판단할 수 있다. 적어도 하나의 사용자 단말(11 내지 16) 중 하나는 세포주 별로 유전자 발현 변화량을 입력으로 하여 학습된 특징 선별부, 특징 선별부에서의 특징 개수 만큼의 유전자의 발현 변화량을 입력으로 하여 학습된 합성치사 예측부를 이용하여 유전자들의 합성치사 여부를 판단할 수 있다. 이때, 합성치사 예측부는, 정해진 유전자들 중 하나를 배제하여 학습된 복수의 예측 모델들을 포함할 수 있다. 여기서 정해진 유전자들은, 암 등의 질병과의 관련성, 합성치사 관계 여부로 결정된 것일 수 있다. 정해진 유전자들에 대한 데이터는, 미리 저장되거나 외부의 데이터베이스에 요청하여 획득할 수 있다. 복수의 예측 모델들은, 정해진 유전자들 중 하나를 번갈아 배제하여 생성된 것일 수 있다. 예를 들어, 정해진 유전자들이 10개라면, 10개를 각각 배제하여 10개의 복수의 예측 모델들이 생성될 수 있다. 적어도 하나의 사용자 단말(11 내지 16) 중 하나는 복수의 예측 모델들로부터의 결과 중 빈도수가 가장 많은 결과를 최종 결과로 결정할 수 있다. According to an embodiment of the present disclosure, one of the at least one user terminal (11 to 16) can determine whether a pair of genes can be in a synthetic lethal relationship by inputting data according to the amount of change in gene expression. . One of the at least one user terminal (11 to 16) is a feature selection unit learned by inputting the change in gene expression for each cell line, and a synthetic lethality prediction learned by inputting the change in expression of genes corresponding to the number of features in the feature selection unit. Using the wealth, it is possible to determine whether genes are synthetically lethal. At this time, the synthetic lethality prediction unit may include a plurality of prediction models learned by excluding one of the designated genes. The genes determined here may be determined by their relationship with diseases such as cancer or their relationship with synthetic lethality. Data on designated genes can be stored in advance or obtained by requesting an external database. A plurality of prediction models may be generated by alternately excluding one of the designated genes. For example, if there are 10 designated genes, 10 multiple prediction models can be created by excluding each of the 10 genes. One of the at least one user terminal 11 to 16 may determine the most frequent result among the results from the plurality of prediction models as the final result.

또한 본 개시의 일 실시 예에 따르면, 합성치사 예측 네트워크 시스템(1)은 유전자 발현 변화량에 따른 데이터를 입력으로 하여 1쌍의 유전자가 합성치사 관계가 될 수 있는지 여부를 판단할 수 있다. 합성치사 예측 네트워크 시스템(1)은 세포주 별로 유전자 발현 변화량을 입력으로 하여 학습된 특징 선별부, 특징 선별부에서의 특징 개수 만큼의 유전자의 발현 변화량을 입력으로 하여 학습된 합성치사 예측부를 이용하여 유전자들의 합성치사 여부를 판단할 수 있다. 이때, 합성치사 예측부는, 정해진 유전자들 중 하나를 배제하여 학습된 복수의 예측 모델들을 포함할 수 있다. 여기서 정해진 유전자들은, 암 등의 질병과의 관련성, 합성치사 관계 여부로 결정된 것일 수 있다. 정해진 유전자들에 대한 데이터는, 미리 저장되거나 외부의 데이터베이스에 요청하여 획득할 수 있다. 복수의 예측 모델들은, 정해진 유전자들 중 하나를 번갈아 배제하여 생성된 것일 수 있다. 예를 들어, 정해진 유전자들이 10개라면, 10개를 각각 배제하여 10개의 복수의 예측 모델들이 생성될 수 있다. 합성치사 예측 네트워크 시스템(1)은 복수의 예측 모델들로부터의 결과 중 빈도수가 가장 많은 결과를 최종 결과로 결정할 수 있다. Additionally, according to an embodiment of the present disclosure, the synthetic lethality prediction network system 1 can determine whether a pair of genes can have a synthetic lethal relationship by inputting data according to the amount of change in gene expression. The synthetic lethality prediction network system (1) uses a feature selection unit learned by inputting the change in gene expression for each cell line, and a synthetic lethality prediction unit learned by inputting the change in expression of genes equal to the number of features in the feature selection unit. It is possible to determine whether or not they are synthetically lethal. At this time, the synthetic lethality prediction unit may include a plurality of prediction models learned by excluding one of the designated genes. The genes determined here may be determined by their relationship with diseases such as cancer or their relationship with synthetic lethality. Data on designated genes can be stored in advance or obtained by requesting an external database. A plurality of prediction models may be generated by alternately excluding one of the designated genes. For example, if there are 10 designated genes, 10 multiple prediction models can be created by excluding each of the 10 genes. The synthetic fatality prediction network system 1 may determine the most frequent result among the results from a plurality of prediction models as the final result.

도 3은 본 개시의 일 실시 예에 따른 특징 선별부(120)와 합성치사 예측부(130)의 상세 블록도이다. Figure 3 is a detailed block diagram of the feature selection unit 120 and the synthetic fatality prediction unit 130 according to an embodiment of the present disclosure.

본 개시의 실시 예들에 따른 합성치사 예측 장치(100)는 8개의 세포주들에 대해서 학습하여 도 3과 같이 구현될 수 있다. 도 3에서는, 제1 내지 제8 서브 특징 선별부(121, 122, 123, 124, 125, 126, 127, 128), 8개의 서브 특징 선별부를 포함하는 것으로 도시되어 있으나, 이에 한정되지 않고, 다양한 개수의 서브 특징 선별부를 포함하도록 구현될 수 있다. The synthetic lethal prediction device 100 according to embodiments of the present disclosure can be implemented as shown in FIG. 3 by learning about eight cell lines. In Figure 3, the first to eighth sub-feature selection units (121, 122, 123, 124, 125, 126, 127, 128) are shown to include eight sub-feature selection units, but are not limited to this and include various sub-feature selection units. It may be implemented to include a number of sub-feature selection units.

제1 서브 특징 선별부(121)는 A375의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 A375 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다. The first sub-feature selection unit 121 is learned from data on the amount of gene expression change in the A375 cell line, and can output the number of genes used for predicting synthetic lethality from the data on the A375 cell line as a result of learning.

제2 서브 특징 선별부(122)는 A549의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 A549세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The second sub-feature selection unit 122 is learned from data on the amount of gene expression change in the A549 cell line, and can output the number of genes used for predicting synthetic lethality from the data on the A549 cell line as a result of learning.

제3 서브 특징 선별부(123)는 HA1E 의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 HA1E 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The third sub-feature selection unit 123 is learned from data on the amount of gene expression change in the HA1E cell line, and can output the number of genes used for predicting synthetic lethality from the data on the HA1E cell line as a result of learning.

제4 서브 특징 선별부(124)는 HCC515의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 HCC515 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The fourth sub-feature selection unit 124 is learned from data on the amount of gene expression change in the HCC515 cell line, and can output the number of genes used for predicting synthetic lethality from the data on the HCC515 cell line as a result of learning.

제5 서브 특징 선별부(125)는 HEPG2의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 HEPG2 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The fifth sub-feature selection unit 125 is learned with data on the amount of gene expression change in the HEPG2 cell line, and can output the number of genes used for predicting synthetic lethality from the data on the HEPG2 cell line as a result of learning.

제6 서브 특징 선별부(126)는 HT29의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 HT29 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The sixth sub-feature selection unit 126 is learned from data on the amount of gene expression change in the HT29 cell line, and as a result of learning, can output the number of genes used for predicting synthetic lethality from the data on the HT29 cell line.

제7 서브 특징 선별부(127)는 MCF7의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 MCF7 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The seventh sub-feature selection unit 127 is learned from data on the amount of gene expression change in the MCF7 cell line, and can output the number of genes used for predicting synthetic lethality from the data on the MCF7 cell line as a result of learning.

제8 서브 특징 선별부(128)는 PC3의 세포주에서의 유전자 발현 변화량에 대한 데이터로 학습된 것으로, 학습 결과 PC3 세포주에 대한 데이터로부터 합성치사 예측에 이용되는 유전자의 개수를 출력할 수 있다.The eighth sub-feature selection unit 128 is learned with data on the amount of gene expression change in the PC3 cell line, and can output the number of genes used for predicting synthetic lethality from the data on the PC3 cell line as a result of learning.

특징 선별부(120)는, 입력된 유전자들에 대한 합성치사 관계 여부를 결정하기 위해서, 제1 내지 제8 서브 특징 선별부(128)에서 획득된 출력값들을 각각 산출할 수 있다. 제1 내지 제8 서브 특징 선별부(128)에서 획득된 8개의 출력값들은 합성치사 예측부(130)로 전달되어 합성치사 관계를 예측하는데 이용될 수 있다. The feature selection unit 120 may calculate output values obtained from the first to eighth sub-feature selection units 128, respectively, in order to determine whether there is a synthetic lethal relationship with the input genes. The eight output values obtained from the first to eighth sub-feature selection units 128 may be transmitted to the synthetic fatality prediction unit 130 and used to predict the synthetic lethality relationship.

합성치사 예측부(130)는 A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3 총 8개의 세포주 별로 학습된 제1 내지 제8 서브 합성치사 예측부(131, 132, 133, …, 138)을 이용하여 입력된 유전자들의 합성치사 관계를 출력할 수 있다. The synthetic lethal prediction unit 130 includes the first to eighth sub synthetic lethal prediction units (131, 132, 133, …, 138) learned for a total of eight cell lines: A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, and PC3. ) can be used to output the synthetic lethality relationship of the input genes.

제1 서브 합성치사 예측부(131)는 A375 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제1 서브 특징 선별부(121)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제1 서브 합성치사 예측부(131)는, A375 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제1 서브 특징 선별부(121)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The first sub-synthetic lethality prediction unit 131 is learned using the gene expression change obtained through the A375 cell line as input, and selects genes used for learning based on the results of the first sub-feature selection unit 121. You can. When the first sub synthetic lethality prediction unit 131 receives the amount of gene expression change for the A375 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the first sub feature selection unit 121. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제2 서브 합성치사 예측부(132)는 A549 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제2 서브 특징 선별부(122)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제2 서브 합성치사 예측부(132)는, A549세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제2 서브 특징 선별부(122)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The second sub-synthetic lethality prediction unit 132 is learned using the gene expression change obtained through the A549 cell line as input, and selects genes used for learning based on the results of the second sub-feature selection unit 122. You can. When the second sub synthetic lethality prediction unit 132 receives the amount of gene expression change for the A549 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the second sub feature selection unit 122. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제3 서브 합성치사 예측부(133)는 HA1E 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제3 서브 특징 선별부(123)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제3 서브 합성치사 예측부(133)는, HA1E 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제3 서브 특징 선별부(123)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The third sub-synthetic lethality prediction unit 133 is learned using the gene expression change obtained through the HA1E cell line as input, and selects genes used for learning based on the results of the third sub-feature selection unit 123. You can. When the third sub synthetic lethality prediction unit 133 receives the amount of gene expression change for the HA1E cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the third sub feature selection unit 123. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제4 서브 합성치사 예측부(134)는 HCC515 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제4 서브 특징 선별부(124)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제4 서브 합성치사 예측부(134)는, HCC515 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제4 서브 특징 선별부(124)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The fourth sub-synthetic lethality prediction unit 134 is learned using the gene expression change obtained through the HCC515 cell line as input, and selects genes used for learning based on the results of the fourth sub-feature selection unit 124. You can. When the fourth sub synthetic lethality prediction unit 134 receives the amount of gene expression change for the HCC515 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the fourth sub feature selection unit 124. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제5 서브 합성치사 예측부(135)는 HEPG2 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제5 서브 특징 선별부(125)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제5 서브 합성치사 예측부(135)는, HEPG2 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제5 서브 특징 선별부(125)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The fifth sub synthetic lethality prediction unit 135 is learned using the gene expression change obtained through the HEPG2 cell line as input, and selects genes used for learning based on the results of the fifth sub feature selection unit 125. You can. When the fifth sub synthetic lethality prediction unit 135 receives the amount of gene expression change for the HEPG2 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the fifth sub feature selection unit 125. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제6 서브 합성치사 예측부(136)는 HT29 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제6 서브 특징 선별부(126)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제6 서브 합성치사 예측부(136)는, HT29 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제6 서브 특징 선별부(126)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The sixth sub-synthetic lethal prediction unit 136 is learned using the gene expression change obtained through the HT29 cell line as input, and selects genes used for learning based on the results of the sixth sub-feature selection unit 126. You can. When the sixth sub synthetic lethality prediction unit 136 receives the amount of gene expression change for the HT29 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the sixth sub feature selection unit 126. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제7 서브 합성치사 예측부(137)는 MCF7 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제7 서브 특징 선별부(127)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제7 서브 합성치사 예측부(137)는, MCF7 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제7 서브 특징 선별부(127)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The 7th sub synthetic lethality prediction unit 137 is learned using the gene expression change obtained through the MCF7 cell line as input, and selects genes used for learning based on the results of the 7th sub feature selection unit 127. You can. When the 7th sub synthetic lethality prediction unit 137 receives the gene expression change amount for the MCF7 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the 7th sub feature selection unit 127. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

제8 서브 합성치사 예측부(138)는 PC3 세포주를 통해 획득된 유전자 발현 변화량을 입력으로 하여 학습된 것으로, 제8 서브 특징 선별부(128)의 결과값에 의해 학습에 이용되는 유전자들을 선별할 수 있다. 제8 서브 합성치사 예측부(138)는, PC3 세포주에 대한 유전자 발현 변화량을 입력받게 되면, 제8 서브 특징 선별부(128)의 결과값(선별 유전자의 셋트, 선별 유전자의 개수 등)에 대응되는 유전자의 발현 변화량을 입력으로 하여 유전자들의 합성치사 관계를 출력할 수 있다. The 8th sub-synthetic lethality prediction unit 138 is learned using the gene expression change obtained through the PC3 cell line as input, and selects genes used for learning based on the results of the 8th sub-feature selection unit 128. You can. When the 8th sub synthetic lethality prediction unit 138 receives the amount of gene expression change for the PC3 cell line, it corresponds to the result value (set of selection genes, number of selection genes, etc.) of the 8th sub feature selection unit 128. The synthetic lethal relationship of genes can be output by using the change in expression of the gene as input.

합성치사 예측부(130)는 제1 내지 제8 서브 합성치사 예측부(131, 132, 133, …, 138)에서 출력된 결과값들을 조합하여 입력된 유전자의 합성치사 여부를 판단할 수 있다. The synthetic lethal prediction unit 130 may determine whether the input gene is synthetic lethal by combining the results output from the first to eighth sub synthetic lethal prediction units 131, 132, 133, ..., 138.

도 4는 제1 서브 합성치사 예측부(131)의 블록도이다. Figure 4 is a block diagram of the first sub synthetic fatality prediction unit 131.

제1 서브 합성치사 예측부(131)는 도 4에 도시된 바와 같이, 제1-1s 서브 합성치사 예측부(131S1), 제1-2s 서브 합성치사 예측부(131S2), …, 제1-76s 서브 합성치사 예측부(131S76),제1 예측값 결정부(131-o)를 포함하여 구현될 수 있다. 제1 서브 합성치사 예측부(131)의 구조와 마찬가지로, 제2 내지 제8 서브 합성치사 예측부 역시, 76개의 서브 합성치사 예측부들, 예측값 결정부를 포함하여 구현될 수 있다. As shown in FIG. 4, the first sub synthetic fatality prediction unit 131 includes a 1-1s sub synthetic fatality prediction unit 131S1, a 1-2s sub synthetic fatality prediction unit 131S2,... , It may be implemented including a 1-76s sub synthetic fatality prediction unit 131S76 and a first prediction value determination unit 131-o. Similar to the structure of the first sub-synthetic fatality prediction unit 131, the second to eighth sub-synthetic fatality prediction units may also be implemented including 76 sub synthetic lethality prediction units and a prediction value determination unit.

제1 서브 합성치사 예측부(131)는 합성치사 관계와 질병과의 관련성을 배제한 독립성이 있는 결과를 얻기 위해서 합성치사 관계이면서 질병과의 관련성이 높은 유전자들 중 일부를 배제하면서 서브 합성치사 예측부들을 학습시킬 수 있다. 합성치사 관계이면서 질병과의 관련성이 높은 유전자들에 대해서는, 미리 저장된 것으로, 도 4에서는, 76개의 유전자들인 경우 구현된 예를 말한다. The first sub-synthetic lethal prediction unit 131 is a sub-synthetic lethal prediction unit that excludes some of the genes that are synthetic lethal and highly related to the disease in order to obtain independent results that exclude the synthetic lethal relationship and the relationship with the disease. can be learned. Genes that are synthetic lethal and highly related to disease are stored in advance. Figure 4 shows an example of 76 genes implemented.

제1 서브 합성치사 예측부(131)는 76개의 유전자들 중에서, 제1 유전자를 배제한 유전자 발현 변화량 데이터를 기초로 제1-1s 서브 합성치사 예측부(131S1)를 학습시킬 수 있다. 순차적으로, 제1 서브 합성치사 예측부(131)는 76개의 유전자들 중에서, 제k 유전자를 배제한 유전자 발현 변화량 데이터를 기초로 제1-ks 서브 합성치사 예측부(131Sk)를 학습시킴으로써, 제1 내지 제76s 서브 합성치사 예측부들을 생성할 수 있다. 여기서, k는 1 이상 76 이하의 자연수를 말한다. The first sub synthetic lethality prediction unit 131 can learn the 1-1s sub synthetic lethality prediction unit 131S1 based on gene expression change data excluding the first gene among 76 genes. Sequentially, the first sub-synthetic lethality prediction unit 131 trains the 1-ks sub-synthetic lethality prediction unit 131Sk based on gene expression change data excluding the k-th gene among the 76 genes, thereby The through 76s sub synthetic fatality prediction units can be generated. Here, k refers to a natural number between 1 and 76.

결과적으로, 제1-1s 내지 제1-76s 서브 합성치사 예측부(131S1, 131S2, 131S3, … 131S76)에 입력되는 유전자 셋트는 부분적으로 일치하지만 완전히 일치하지 않게 된다. 이렇게 완전히 일치하지 않는 유전자 셋트로 훈련된 76개의 서브 합성치사 예측부들을 이용함으로써, 합성치사와 관련성이 높은 유전자과의 독립성이 높은 모델이 생성될 수 있다. As a result, the gene sets input to the 1-1s to 1-76s sub synthetic lethal prediction units (131S1, 131S2, 131S3, ... 131S76) partially match but do not completely match. By using 76 sub-synthetic lethal prediction units trained with this completely mismatched gene set, a model with high independence from genes highly related to synthetic lethality can be created.

제1 예측값 결정부(131-o)는 제1-1s 내지 제1-76s 서브 합성치사 예측부(131S1, 131S2, 131S3, … 131S76)로부터의 결과값들 중에서 가장 많은 확률로 발생되는 결과값을 최종 결과값으로 출력할 수 있다. The first predicted value determination unit 131-o determines the result value that occurs with the highest probability among the result values from the 1-1s to 1-76s sub synthetic fatality prediction units 131S1, 131S2, 131S3, ... 131S76. It can be output as the final result.

도 5는 특징 선별부를 학습시키는 특징 선별 학습 장치(200)의 블록도이다. Figure 5 is a block diagram of the feature selection learning device 200 that trains the feature selection unit.

특징 선별 학습 장치(200)는 특징 선별부, 서브 특징 선별부를 학습시켜 소정의 퀄리티를 가지는 특징 선별부, 서브 특징 선별부를 생성할 수 있다. The feature selection learning device 200 can generate a feature selection unit and a sub-feature selection unit with a predetermined quality by learning the feature selection unit and the sub-feature selection unit.

데이터 입력부(210)는 학습에 필요한 유전자에 대한 데이터를 입력하는 기능을 수행한다. 데이터 입력부(210)는 세포주들에 대한 유전자 발현 변화량 등의 데이터를 입력할 수 있다. The data input unit 210 performs the function of inputting data about genes required for learning. The data input unit 210 can input data such as the amount of change in gene expression for cell lines.

데이터 입력부(210)에 입력되는 데이터는, LINCS L1000 프로젝트에서 시행한 978개의 랜드마크 유전자와 596개의 DNA 수선 유전자를 합한 1574개의 유전자에 대한 3000여개의 약물 섭동, 유전자 간섭 실험 결과들을 바탕으로 여러가지 암 세포주의 결과를 포함할 수 있다. 랜드마크 유전자에 대한 분위수 정규화 결과와 랜드마크 유전자 이외의 유전자들에 대한 추론에 대한 데이터가 데이터 입력부(210)에 입력될 수 있다. The data input to the data input unit 210 is based on the results of over 3,000 drug perturbation and gene interference experiments on 1,574 genes including 978 landmark genes and 596 DNA repair genes conducted in the LINCS L1000 project. Results from cell lines may be included. Quantile normalization results for the landmark gene and inference data for genes other than the landmark gene may be input into the data input unit 210.

모델 학습부(220)는 입력된 데이터를 이용하여 특징 선별부를 학습시킬 수 있다. 모델 학습부(220)는 세포주 별로 다른 특징 선별부들을 학습시킬 수 있다. 모델 학습부(220)는 하나 이상의 특징 선별부를 학습시킬 수 있다. 모델 학습부(320)는 입력된 데이터를 언더 샘플링, 오버 샘플링 등의 방법으로 처리하고 특징 선별부를 학습시킬 수 있다. 모델 학습부(220)는 lightGBM, KNN, Logistic regression, Naive bayes, Random forest SVM, EXP2SL 등의 방법으로, 하나 이상의 특징 선별부를 학습시켜 생성할 수 있다. The model learning unit 220 can train the feature selection unit using the input data. The model learning unit 220 can learn different feature selection units for each cell line. The model learning unit 220 may train one or more feature selection units. The model learning unit 320 may process the input data using methods such as under-sampling and over-sampling and train the feature selection unit. The model learning unit 220 can be created by learning one or more feature selection units using methods such as lightGBM, KNN, Logistic regression, Naive bayes, Random forest SVM, and EXP2SL.

모델 평가부(230)는 학습된 특징 선별부들을 평가하여, 성능이 좋은 특징 선별부를 구출할 수 있다. 도 6a에 도시된 바와 같이, 모델 평가부(230)는, A549의 세포주를 입력하여 학습된 특징 선별부들의 정확도(Accuracy), AUROC(area under the ROC curve), MCC(Matthews correlation coefficient), F1 score 등을 비교하여, 가장 성능이 좋은 모델을 결정할 수 있다. 모델 평가부(230)에서는, LightGBM, EXP2SL의 방법으로 생성된 특징 선별부가 최종적으로 구축될 수 있다. 특징 선별부의 성능은, 입력된 데이터의 세포주에 따라서 달라지게 될 수 있다. 도 6b에 도시된 바와 같이, SLDB 데이터 셋으로 학습된 경우, LightGBM의 방법으로 생성된 것 특징 선별부가 가장 성능이 좋은 것으로 나타날 수 있다. The model evaluation unit 230 may evaluate the learned feature selection units to rescue a feature selection unit with good performance. As shown in FIG. 6A, the model evaluation unit 230 calculates the accuracy, area under the ROC curve (AUROC), Matthews correlation coefficient (MCC), and F1 of the feature selection units learned by inputting the cell line of A549. By comparing scores, etc., you can determine the model with the best performance. In the model evaluation unit 230, a feature selection unit generated by the LightGBM and EXP2SL methods can be finally constructed. The performance of the feature selection unit may vary depending on the cell line of the input data. As shown in Figure 6b, when trained with the SLDB data set, the feature selection unit generated by the LightGBM method appears to have the best performance.

도 7은 특징 선별부의 피쳐 개수에 따른 F1 score 값들의 그래프이다. Figure 7 is a graph of F1 score values according to the number of features in the feature selection unit.

특징 선별 학습 장치(200)는 세포주 A549에 대한 데이터로 학습시킨 결과, 20개의 유전자 셋트로 합성치사 관계를 예측할 수 있다는 특징 선별부를 생성할 수 있다. The feature selection learning device 200 can generate a feature selection unit that can predict a synthetic lethal relationship with a set of 20 genes as a result of learning with data on the cell line A549.

도 8은 다른 실시 예에 따른 특징 선별부의 피쳐 개수에 따른 F1 score 값들의 그래프이다. Figure 8 is a graph of F1 score values according to the number of features of the feature selection unit according to another embodiment.

특징 선별 학습 장치(200)는 다른 세포주에 대한 데이터로 학습시킨 결과, 60개의 유전자 셋트로 합성치사 관계를 예측할 수 있는 특징 선별부를 생성할 수 있다. The feature selection learning device 200 can generate a feature selection unit capable of predicting a synthetic lethal relationship with a set of 60 genes as a result of learning with data on other cell lines.

도 9는 합성치사 예측부를 학습시키는 예측부 학습 장치(300)의 블록도이다. Figure 9 is a block diagram of the prediction unit learning device 300 that trains the synthetic fatality prediction unit.

예측부 학습 장치(300)는 합성치사 예측부, 서브 합성치사 예측부를 학습시켜 소정의 정확도 이상을 가지는 합성치사 예측부, 서브 합성치사 예측부를 생성할 수 있다. The prediction unit learning device 300 can generate a synthetic lethality prediction unit and a sub-synthetic lethality prediction unit with a predetermined accuracy or higher by training the synthetic lethality prediction unit and the sub-synthetic lethality prediction unit.

데이터 입력부(310)는 학습에 필요한 유전자에 대한 데이터를 입력하는 기능을 수행한다. 데이터 입력부(310)는 세포주들에 대한 유전자 발현 변화량 등의 데이터를 입력할 수 있다. The data input unit 310 performs the function of inputting data about genes required for learning. The data input unit 310 can input data such as the amount of change in gene expression for cell lines.

데이터 입력부(310)에 입력되는 데이터는, LINCS L1000 프로젝트에서 시행한 978개의 랜드마크 유전자와 596개의 DNA 수선 유전자를 합한 1574개의 유전자에 대한 3000여개의 약물 섭동, 유전자 간섭 실험 결과들을 바탕으로 여러가지 암 세포주의 결과를 포함할 수 있다. 랜드마크 유전자에 대한 분위수 정규화 결과와 랜드마크 유전자 이외의 유전자들에 대한 추론에 대한 데이터가 데이터 입력부(310)에 입력될 수 있다. The data input to the data input unit 310 is based on the results of over 3,000 drug perturbation and gene interference experiments on 1,574 genes including 978 landmark genes and 596 DNA repair genes conducted in the LINCS L1000 project. Results from cell lines may be included. Quantile normalization results for the landmark gene and inference data for genes other than the landmark gene may be input into the data input unit 310.

모델 학습부(320)는 입력된 데이터를 이용하여 합성치사 예측부를 학습시킬 수 있다. 모델 학습부(320)는 세포주 별로 다른 합성치사 예측부들을 학습시킬 수 있다. 각각의 세포주에 대한 합성치사 예측부는, 복수의 학습 모델들을 포함하는 앙상블 모델로 학습될 수 있다. 모델 학습부(320)는 하나 이상의 합성치사 예측부를 학습시킬 수 있다. 모델 학습부(320)는 입력된 데이터를 언더 샘플링, 오버 샘플링 등의 방법으로 처리하고 합성치사 예측부를 학습시킬 수 있다. 모델 학습부(320)는 lightGBM, KNN, Logistic regression, Naive bayes, Random forest, SVM, EXP2SL 등의 방법으로, 하나 이상의 합성치사 예측부를 학습시켜 생성할 수 있다. The model learning unit 320 can train the synthetic fatality prediction unit using the input data. The model learning unit 320 can learn different synthetic lethality prediction units for each cell line. The synthetic lethality prediction unit for each cell line can be learned as an ensemble model including a plurality of learning models. The model learning unit 320 may train one or more synthetic fatality prediction units. The model learning unit 320 may process the input data using methods such as under-sampling and over-sampling and train the synthetic fatality prediction unit. The model learning unit 320 can be generated by learning one or more synthetic fatality prediction units using methods such as lightGBM, KNN, Logistic regression, Naive Bayes, Random forest, SVM, and EXP2SL.

도 10은 본 개시의 실시 예들에 따른 합성치사 예측 장치(100)를 포함하는 네트워크 환경에 대한 예시 도면이다. FIG. 10 is an example diagram of a network environment including the synthetic fatality prediction device 100 according to embodiments of the present disclosure.

합성치사 예측 장치(100), 특징 선별 학습 장치(200), 합성치사 예측 학습 장치(300), 데이터베이스(400), 사용자 단말기(500) 중 적어도 2개의 장치들은 네트워크로 연결되어 통신할 수 있다. At least two of the synthetic fatality prediction device 100, the feature selection learning device 200, the synthetic fatality prediction learning device 300, the database 400, and the user terminal 500 may be connected to a network and communicate.

합성치사 예측 장치(100)는 특징 선별 학습 장치(200)로부터 학습된 특징 선별부를 수신받아 특징 선별부에 데이터를 입력할 수 있다. 합성치사 예측 장치(100)는 합성치사 예측 학습 장치(300)로부터 학습된 합성치사 예측부를 수신받아 합성치사 예측부에 데이터를 입력할 수 있다. 합성치사 예측 장치(100)는, 특징 선별부, 합성치사 예측부를 이용하여 유전자들에 대한 새로운 합성치사 관계들을 결정하여 출력할 수 있다. 이렇게 출력되는 데이터는 데이터베이스(400)로 전송되어 저장될 수 있다. 특징 선별 학습 장치(200)는 합성치사 예측 장치(100) 또는 데이터베이스(400)로부터 새롭게 입력되는 유전자 발현 변화량 데이터 및 합성치사 관계 여부 등의 데이터를 수신하고 이렇게 수신된 데이터로 다시 학습을 수행할 수 있다. 합성치사 예측 학습 장치(300)는 합성치사 예측 장치(100) 또는 데이터베이스(400)로부터 새롭게 입력되는 유전자 발현 변화량 데이터 및 합성치사 관계 여부 등의 데이터를 수신하고 이렇게 수신된 데이터로 다시 학습을 수행할 수 있다. 합성치사 예측 장치(100)는, 이렇게 새롭게 학습된 결과로 생성된 특징 선별부, 합성치사 예측부를 수신하여 주기적으로 갱신할 수 있다. The synthetic fatality prediction device 100 may receive a feature selection unit learned from the feature selection learning device 200 and input data into the feature selection unit. The synthetic lethality prediction device 100 may receive a synthetic lethality prediction unit learned from the synthetic lethality prediction learning device 300 and input data into the synthetic lethality prediction unit. The synthetic lethality prediction device 100 can determine and output new synthetic lethality relationships for genes using a feature selection unit and a synthetic lethality prediction unit. The data output in this way can be transmitted to the database 400 and stored. The feature selection learning device 200 can receive data such as newly input gene expression change amount data and whether there is a synthetic lethal relationship from the synthetic lethality prediction device 100 or the database 400, and perform learning again with the received data. there is. The synthetic lethality prediction learning device 300 receives newly input data such as gene expression change amount data and synthetic lethality relationship from the synthetic lethality prediction device 100 or the database 400 and performs learning again with the received data. You can. The synthetic fatality prediction device 100 may receive the feature selection unit and the synthetic fatality prediction unit generated as a result of this new learning and periodically update them.

합성치사 예측 장치(100)는 새로운 합성치사 관계를 예측하기 위해 특징 선별부, 합성치사 예측부에 입력하는 유전자들에 대한 발현 변화량에 대한 데이터를 데이터베이스(400)로부터 주기적으로 수신할 수 있다. The synthetic lethality prediction device 100 may periodically receive data on expression changes for genes input to the feature selection unit and the synthetic lethality prediction unit from the database 400 in order to predict a new synthetic lethality relationship.

도 11는 본 개시의 실시 예들에 따른 합성치사 예측 방법의 흐름도이다. Figure 11 is a flowchart of a synthetic fatality prediction method according to embodiments of the present disclosure.

S110에서는, 합성치사 예측 장치가, 외부의 데이터베이스로부터 유전자들에 대한 발현 변화량에 대한 데이터들을 수신할 수 있다. In S110, the synthetic lethality prediction device can receive data on expression changes in genes from an external database.

S120에서는, 합성치사 예측 장치가, 특징 선별부를 이용하여 합성치사 관계를 예측하기에 적합한 입력 값의 제1 개수를 결정할 수 있다. In S120, the synthetic fatality prediction device may determine a first number of input values suitable for predicting a synthetic fatality relationship using a feature selection unit.

S130에서는, 합성치사 예측 장치가, 유전자들에 대한 발현 변화량 중에서, 제1 개수만큼의 입력 값 셋트를 획득할 수 있다. In S130, the synthetic lethality prediction device may obtain a set of input values corresponding to the first number of expression changes for genes.

S140에서는, 합성치사 예측 장치가, 제1 개수의 유전자의 발현 변화량들의 셋트로 제1 내지 제8 합성치사 예측부 중 적어도 하나에 입력하여, 제1 내지 제8 합성치사 예측부를 학습시켜 생성할 수 있다. In S140, the synthetic lethal prediction device may input a set of expression changes of a first number of genes into at least one of the first to eighth synthetic lethal prediction units, and learn and generate the first to eighth synthetic lethal prediction units. there is.

S150에서는, 합성치사 예측 장치가, 유전자들에 대한 발현 변화량들을 제1 내지 제8 합성치사 예측부 중 적어도 하나로 입력하여, 상기 타겟 유전자와 새로운 합성 치사 관계의 하나 이상의 유전자를 출력할 수 있다. In S150, the synthetic lethal prediction device may input expression changes for genes into at least one of the first to eighth synthetic lethal prediction units and output one or more genes in a new synthetic lethal relationship with the target gene.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시 예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), It may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시 예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시 예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시 예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시 예들이 비록 한정된 실시 예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시 예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the claims described below.

Claims

A synthetic lethal prediction device receiving data on expression changes in genes from an external database;
determining a first number of input values suitable for predicting a synthetic lethal relationship using a feature selection unit;
Obtaining a set of input values corresponding to the first number of expression changes for the genes;
Inputting a set of expression changes of the first number of genes into at least one of first to eighth synthetic lethality prediction units, and learning and generating the first to eighth synthetic lethality prediction units; and
Inputting the expression changes for the genes into at least one of the first to eighth synthetic lethal prediction units, and outputting one or more genes in a new synthetic lethal relationship with the target gene. A method for predicting synthetic lethality, comprising:

According to paragraph 1,
The first to eighth synthetic fatality prediction units,
Containing a plurality of prediction models each trained with genes in one of the first to eighth cell lines,
A synthetic fatality prediction method that outputs a final output value by combining output values from a plurality of prediction models.

According to paragraph 1,
The feature selection unit,
A synthetic lethality prediction method that inputs expression changes of a plurality of genes in one of the first to eighth cell lines and outputs the number of input values suitable for predicting a synthetic lethality relationship as an output value.

According to paragraph 2,
The plurality of prediction models included in the first to eighth synthetic fatality prediction units are,
A synthetic lethality prediction method, which is a model that excludes one of the genes highly related to cancer development and learns the expression changes of the remaining genes as input.

According to paragraph 4,
The plurality of prediction models included in the first to eighth synthetic fatality prediction units are,
A method for predicting synthetic lethality, which is generated as many genes as are highly related to the synthetic lethality.

According to paragraph 1,
The feature selection unit,
Depending on the result of comparing the number of genes with synthetic lethality and the number of genes with non-synthetic lethality contained in the cell line, data on the expression changes of genes in the cell line are processed using either undersampling or oversampling. A synthetic fatality prediction method that is learned.

Includes a processor, a communication unit, and a memory,
The processor executes instructions stored in the memory,
Receive data on expression conversion amounts for genes from an external database,
Determine a first number of input values suitable for predicting a synthetic lethal relationship using a feature selection unit,
Obtaining a set of input values equal to the first number among the expression values of the target gene and the remaining genes,
A set of expression changes of the first number of genes is input to at least one of the first to eighth synthetic lethality prediction units, and the first to eighth synthetic lethality prediction units are trained and generated,
A synthetic lethal prediction device that inputs the expression changes for the genes into at least one of the first to eighth synthetic lethal prediction units and outputs one or more genes in a new synthetic lethal relationship with the target gene.

In clause 7,
The first to eighth synthetic fatality prediction units,
Containing a plurality of prediction models each trained with genes in one of the first to eighth cell lines,
A synthetic fatality prediction device that outputs a final output value by combining output values from a plurality of prediction models.

In clause 7,
The feature selection unit,
A synthetic lethality prediction device that is trained to input the expression change amount of a plurality of genes in one of the first to eighth cell lines and output as an output the number of input values suitable for predicting a synthetic lethality relationship.

According to clause 8,
The plurality of prediction models included in the first to eighth synthetic fatality prediction units are,
A synthetic lethality prediction device, which is a model that excludes one of the genes highly associated with stored synthetic lethality and learns the expression changes of the remaining genes as input.

According to clause 10,
The plurality of prediction models included in the first to eighth synthetic fatality prediction units are,
A synthetic lethality prediction device that is generated as many genes as are highly related to the synthetic lethality.

In clause 7,
The feature selection unit,
Depending on the result of comparing the number of genes with synthetic lethality and the number of genes with non-synthetic lethality contained in the cell line, data on the expression changes of genes in the cell line are processed using either undersampling or oversampling. A synthetic lethality prediction device that is learned.

A computer program stored in a computer-readable storage medium to execute the method of claim 1 using a computer.