KR102363456B1

KR102363456B1 - Method and system for analyzing genome and medical information and developing drug component based on artificial intelligence

Info

Publication number: KR102363456B1
Application number: KR1020210063476A
Authority: KR
Inventors: 김원태; 김동민; 강신욱; 이명재
Original assignee: (주)제이엘케이
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2022-02-16
Also published as: WO2022245063A1

Abstract

The present invention relates to analysis of genome and medical information and development of drug components based on artificial intelligence. A system for analyzing an analysis genome and providing a drug component development platform comprises: an input unit for acquiring first data indicating a nucleotide sequence of the genome and second data including medical information; a first analysis unit for analyzing the first data and confirming the structure of protein generated by the genome based on the analysis result for the data; a second analysis unit for confirming information about symptoms occurring in a subject by analyzing the second data; a design unit which generates a structure of a compound for use as a drug component based on the structure of the protein, the analysis result for the first data, and the analysis result for the second data, and analyzes the properties of the compound; and an output unit for outputting the analysis result for the data and information related to the structure and properties of the compound. Therefore, the genome and medical information can be analyzed more effectively.

Description

AI-based genome and medical information analysis and pharmaceutical substance development method and system

본 발명은 유전체와 의료 정보 분석 및 의약 물질 개발에 관한 것으로, 인공 지능에 기반하여 유전체와 의료 정보를 분석하고, 관련 의약 물질을 개발하기 위한 방법 및 시스템에 대한 것이다.The present invention relates to genome and medical information analysis and drug substance development, and to a method and system for analyzing genome and medical information based on artificial intelligence and developing a related drug substance.

현대의 의학은 눈부신 발전을 이루고 있는 중이다. 하지만, 새로운 질병 또한 끊임없이 발생되고 있다. 특히, 감염성을 가진 질병은 인간 사회에 큰 위협이 되기도 한다. 따라서, 질병에 대한 효과적인 연구와, 연구 결과에 기반하여 질병에 대응하기 위한 신약을 개발하는 것이 필요하다. Modern medicine is making remarkable progress. However, new diseases are also constantly occurring. In particular, infectious diseases pose a great threat to human society. Therefore, it is necessary to effectively study the disease and develop a new drug to respond to the disease based on the research results.

일반적으로, 신약 개발을 위해서는 많은 시간과 비용이 소요된다. 예컨데, 신약 개발을 위한 기초 데이터를 얻기 위한 분석 절차가 이루어지고, 이에 기반하여 약효가 있는 신약 물질을 설계하고, 이에 대한 검증이 다시 수행된다. 이 모든 과정은 화학 실험을 수반하여, 이를 위해 많은 노력과 시간이 필요하다.In general, it takes a lot of time and money to develop a new drug. For example, an analysis procedure to obtain basic data for new drug development is performed, based on this, a new drug substance with medicinal effect is designed, and verification is performed again. All of these processes involve chemical experiments, which require a lot of effort and time.

M. Skalic 외, "From Target to Drug: Generative Modeling for the Multimodal Structure-Based Ligand Design", Molecular Pharmaceutics, 16:4282-4291. (2019.08.22.)M. Skalic et al., "From Target to Drug: Generative Modeling for the Multimodal Structure-Based Ligand Design", Molecular Pharmaceutics, 16:4282-4291. (2019.08.22.)

본 발명은 질병에 대한 효과적인 유전자와 의료 정보 분석 및 의약 물질 개발을 위한 방법 및 시스템을 제공하기 위한 것이다.An object of the present invention is to provide a method and system for effective gene and medical information analysis for diseases and development of pharmaceutical substances.

본 발명은 인공 지능을 기반으로 유전자를 분석하기 위한 방법 및 시스템을 제공하기 위한 것이다.The present invention is to provide a method and system for analyzing a gene based on artificial intelligence.

본 발명은 인공 지능을 기반으로 의료 정보를 분석하기 위한 방법 및 시스템을 제공하기 위한 것이다.The present invention is to provide a method and system for analyzing medical information based on artificial intelligence.

본 발명은 인공 지능을 기반으로 의약 물질을 개발하기 위한 방법 및 시스템을 제공하기 위한 것이다.The present invention is to provide a method and system for developing pharmaceutical substances based on artificial intelligence.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. will be able

본 발명의 일 실시 예에 따른 유전체와 의료 정보 분석 및 의약 물질 개발 플랫폼을 제공하는 시스템은, 유전체의 염기 서열을 나타내는 제1 데이터 및 의료 정보를 포함하는 제2 데이터를 획득하는 입력부, 상기 제1 데이터를 분석하고, 상기 데이터에 대한 분석 결과에 기반하여 상기 유전체에 의해 생성되는 단백질의 구조를 확인하는 제1 분석부, 상기 제2 데이터를 분석함으로써 대상자에게 발생한 증상에 대한 정보를 확인하는 제2 분석부, 상기 단백질의 구조, 상기 제1 데이터에 대한 분석 결과 및 상기 제2 데이터에 대한 분석 결과에 기반하여 의약 물질로서 사용하기 위한 화합물의 구조를 생성하고, 상기 화합물의 속성을 분석하는 설계부, 및 상기 데이터에 대한 분석 결과 및 상기 화합물의 구조 및 속성에 관련된 정보를 출력하는 출력부를 포함할 수 있다. 여기서, 상기 분석부 및 상기 설계부는, 적어도 하나의 인공 신경망을 이용하여 동작할 수 있다.A system for providing a genome and medical information analysis and pharmaceutical substance development platform according to an embodiment of the present invention includes an input unit for acquiring first data representing a nucleotide sequence of a genome and second data including medical information, the first A first analyzer that analyzes data and checks the structure of a protein generated by the genome based on the analysis result for the data, a second that identifies information about symptoms occurring in the subject by analyzing the second data An analysis unit, a design unit that generates a structure of a compound for use as a pharmaceutical substance based on the analysis unit, the structure of the protein, the analysis result for the first data, and the analysis result for the second data, and analyzes the properties of the compound; and an output unit for outputting an analysis result of the data and information related to the structure and properties of the compound. Here, the analysis unit and the design unit may operate using at least one artificial neural network.

본 발명의 일 실시 예에 따른 유전체와 의료 정보 분석 및 의약 물질 개발을 위한 방법은, 유전체의 염기 서열을 나타내는 제1 데이터 및 의료 정보를 포함하는 제2 데이터를 획득하는 단계, 상기 제1 데이터를 분석하고, 상기 데이터에 대한 분석 결과에 기반하여 상기 유전체에 의해 생성되는 단백질의 구조를 확인하는 단계, 상기 제2 데이터를 분석함으로써 대상자에게 발생한 증상에 대한 정보를 확인하는 단계, 상기 단백질의 구조, 상기 제1 데이터에 대한 분석 결과 및 상기 제2 데이터에 대한 분석 결과에 기반하여 의약 물질로서 사용하기 위한 화합물의 구조를 생성하는 단계, 상기 화합물의 속성을 분석하는 단계, 및 상기 데이터에 대한 분석 결과 및 상기 화합물의 구조 및 속성에 관련된 정보를 출력하는 단계를 포함할 수 있다. 여기서, 상기 데이터의 분석, 상기 단백질의 구조 확인, 상기 화합물의 구조 생성, 상기 화합물의 속성 분석은, 적어도 하나의 인공 신경망을 이용하여 수행될 수 있다.A method for analyzing a genome and medical information and developing a drug substance according to an embodiment of the present invention includes acquiring first data representing a nucleotide sequence of a genome and second data including medical information, the first data analyzing and confirming the structure of the protein generated by the genome based on the analysis result for the data, verifying information on symptoms occurring in the subject by analyzing the second data, the structure of the protein, generating a structure of a compound for use as a pharmaceutical substance based on an analysis result for the first data and an analysis result for the second data, analyzing properties of the compound, and analysis result for the data and outputting information related to the structure and properties of the compound. Here, the analysis of the data, confirmation of the structure of the protein, generation of the structure of the compound, and analysis of properties of the compound may be performed using at least one artificial neural network.

본 발명에 대하여 위에서 간략하게 요약된 특징들은 후술하는 본 발명의 상세한 설명의 예시적인 양상일 뿐이며, 본 발명의 범위를 제한하는 것은 아니다.The features briefly summarized above with respect to the invention are merely exemplary aspects of the detailed description of the invention that follows, and do not limit the scope of the invention.

본 발명에 따르면, 보다 효과적으로 유전자와 의료 정보를 분석하고, 관련 질병에 대응하기 위한 의약 물질이 보다 효과적으로 개발될 수 있다.According to the present invention, it is possible to more effectively analyze genes and medical information, and to more effectively develop pharmaceutical substances for responding to related diseases.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be.

도 1은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템을 나타내는 도면이다.
도 2는 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템의 구조를 나타내는 도면이다.
도 3은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템의 동작 절차를 나타내는 도면이다.
도 4는 본 발명의 일 실시 예에 따른 유전체 분석 플랫폼에서 제공되는 유전체 분석 인터페이스의 예를 나타내는 도면이다.
도 5는 본 발명의 일 실시 예에 따른 의약 물질 개발 플랫폼에서 제공되는 의약 물질 개발 인터페이스의 예를 나타내는 도면이다.
도 6은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템에 적용 가능한 인공 신경망의 구조를 나타내는 도면이다.
도 7은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템에 적용 가능한 인공 신경망들의 연결 구조를 나타내는 도면이다.
도 8은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템에서 전이 학습을 수행하는 절차를 나타내는 도면이다.1 is a diagram illustrating a system for analyzing a genome and developing a drug substance according to an embodiment of the present invention.
2 is a diagram showing the structure of a genome analysis and drug substance development system according to an embodiment of the present invention.
3 is a diagram illustrating an operation procedure of a system for analyzing a genome and developing a drug substance according to an embodiment of the present invention.
4 is a diagram illustrating an example of a genome analysis interface provided by the genome analysis platform according to an embodiment of the present invention.
5 is a diagram illustrating an example of a pharmaceutical material development interface provided by the pharmaceutical material development platform according to an embodiment of the present invention.
6 is a diagram showing the structure of an artificial neural network applicable to a genome analysis and drug substance development system according to an embodiment of the present invention.
7 is a diagram illustrating a connection structure of artificial neural networks applicable to a genome analysis and drug substance development system according to an embodiment of the present invention.
8 is a diagram illustrating a procedure for performing transfer learning in a genome analysis and drug substance development system according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein.

본 발명의 실시 예를 설명함에 있어서 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그에 대한 상세한 설명은 생략한다. 그리고, 도면에서 본 발명에 대한 설명과 관계없는 부분은 생략하였으며, 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In describing an embodiment of the present invention, if it is determined that a detailed description of a well-known configuration or function may obscure the gist of the present invention, a detailed description thereof will be omitted. And, in the drawings, parts not related to the description of the present invention are omitted, and similar reference numerals are attached to similar parts.

이에 본 발명은 질병을 진단 및 분석하는 기술에 대해 제안한다. 나아가, 본 발명은 진단 및 분석된 질병에 대응하기 위한 의약 물질로서 사용 가능한 화합물, 즉, 후보 물질을 설계하는 기술을 제안한다.Accordingly, the present invention proposes a technology for diagnosing and analyzing a disease. Furthermore, the present invention proposes a technique for designing a compound that can be used as a medicinal substance, ie, a candidate substance, for responding to a diagnosed and analyzed disease.

도 1은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템을 나타내는 도면이다.1 is a diagram illustrating a system for analyzing a genome and developing a drug substance according to an embodiment of the present invention.

도 1을 참고하면, 유전체 분석 및 의약 물질 개발 시스템은 통신망에 연결된 로컬 장치(110a), 로컬 장치(110b), 서버(120)을 포함한다. 도 1은 2개의 로컬 장치들(110a, 110b)를 예시하였으나, Referring to FIG. 1 , the genome analysis and drug substance development system includes a local device 110a, a local device 110b, and a server 120 connected to a communication network. 1 illustrates two local devices 110a, 110b,

로컬 장치(110a) 및 로컬 장치(110b)는 시스템을 활용하여 질병을 진단 및 분석하고자 하는 사용자에 의해 사용된다. 로컬 장치(110a) 및 로컬 장치(110b)는 입력 데이터를 획득하고, 입력 데이터를 통신 망을 통해 서버(120)로 송신할 수 있고, 서버(120)로부터 분석의 결과를 포함하는 데이터를 수신할 수 있다. The local device 110a and the local device 110b are used by a user who wants to diagnose and analyze a disease using the system. The local device 110a and the local device 110b may obtain input data, transmit the input data to the server 120 through the communication network, and receive data including the result of analysis from the server 120 . can

서버(120)는 본 발명의 실시 예들에 따른 질병의 진단 및 분석, 나아가 의약 물질로서 사용될 화합물의 설계를 위한 플랫폼을 제공하고, 진단, 분석, 설계의 알고리즘을 수행한다. 다양한 실시 예들에 따라, 진단, 분석, 설계의 알고리즘은 인공 지능 기반으로 수행될 수 있다. 서버(120)는 로컬 장치(110a) 및 로컬 장치(110b) 중 적어도 하나로부터 수신되는 데이터에 기반하여 분자 진단, 유전자 분석, 의료 정보 분석, 의약 물질을 위한 화합물 설계 등의 동작을 수행하고, 결과 데이터를 로컬 장치(110a) 및 로컬 장치(110b) 중 적어도 하나에게 송신한다. 예를 들어, 서버(120)는 클라우드 서버일 수 있다.The server 120 provides a platform for diagnosing and analyzing a disease according to embodiments of the present invention and further designing a compound to be used as a medicinal substance, and performs diagnosis, analysis, and design algorithms. According to various embodiments, the algorithms of diagnosis, analysis, and design may be performed based on artificial intelligence. The server 120 performs operations such as molecular diagnosis, genetic analysis, medical information analysis, and compound design for pharmaceutical substances based on data received from at least one of the local device 110a and the local device 110b, and results It transmits data to at least one of the local device 110a and the local device 110b. For example, the server 120 may be a cloud server.

전술한 바와 같이, 일 실시 예에 따라, 로컬 장치(110a) 및 로컬 장치(110b)는 단말로서, 데이터의 입력 및 출력 기능을 수행하고, 서버(120)는 진단, 분석, 설계 기능들을 수행할 수 있다. 다른 실시 예에 따라, 로컬 장치(110a) 및 로컬 장치(110b)는 진단, 분석, 설계에 대한 적어도 일부의 연산을 수행할 수 있다. 진단, 분석, 설계에 대한 연산의 분담 정도는 로컬 장치 별로 상이할 수 있다. 나아가, 또 다른 실시 예에 따라, 서버(120)의 모든 기능을 포함하는 로컬 장치도 존재할 수 있다. 이 경우, 로컬 장치(110a) 또는 로컬 장치(110b)는 통신 망에 연결되지 아니하더라도 진단, 분석, 설계 동작들을 수행할 수 있다.As described above, according to an embodiment, the local device 110a and the local device 110b are terminals that perform data input and output functions, and the server 120 performs diagnosis, analysis, and design functions. can According to another embodiment, the local device 110a and the local device 110b may perform at least some calculations for diagnosis, analysis, and design. The degree of sharing of calculations for diagnosis, analysis, and design may be different for each local device. Furthermore, according to another embodiment, a local device including all functions of the server 120 may also exist. In this case, the local device 110a or the local device 110b may perform diagnosis, analysis, and design operations even if it is not connected to a communication network.

도 1을 참고하여 설명한 바와 같이, 서버(120)는 질병의 분석/진단, 의약 물질의 개발/설계를 위한 플랫폼을 제공할 수 있다. 일 실시 예에 따른 플랫폼은 신약 개발의 그린 존(green zone)을 형성하며, 질병의 진단 및 예측 서비스를 제공할 수 있다. 나아가, 일 실시 예에 따른 플랫폼은 인공 지능을 활용한 유전체 빅데이터 정보에 기반한 바이오 인포매틱스 서비스를 제공한다.As described with reference to FIG. 1 , the server 120 may provide a platform for analyzing/diagnosing a disease and developing/designing a pharmaceutical substance. The platform according to an embodiment forms a green zone for new drug development and may provide disease diagnosis and prediction services. Furthermore, the platform according to an embodiment provides a bioinformatics service based on genome big data information using artificial intelligence.

도 2는 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템의 구조를 나타내는 도면이다. 도 2에 예시된 구성요소들은 로컬 장치(도 1의 로컬 장치(110a) 또는 로컬 장치(110b)) 및 서버(예: 도 1의 서버(120)) 중 하나에 포함될 수 있으며, 각 구성 요소가 로컬 장치 및 서버에 어떻게 배치되는지는 다양한 실시 예들에 따라 달라질 수 있다. 따라서, 구성 요소들 간 연결은 내부 회로 또는 외부 통신망에 기반할 수 있다.2 is a diagram showing the structure of a genome analysis and drug substance development system according to an embodiment of the present invention. The components illustrated in FIG. 2 may be included in one of a local device (local device 110a or local device 110b in FIG. 1) and a server (eg, server 120 in FIG. 1), and each component How it is deployed in the local device and the server may vary according to various embodiments. Accordingly, the connection between the components may be based on an internal circuit or an external communication network.

도 2를 참고하면, 시스템은 데이터 입력부(210), 유전 정보 분석부(220), 의료 정보 분석부(230), 설계부(240), 출력부(250)를 포함한다.Referring to FIG. 2 , the system includes a data input unit 210 , a genetic information analysis unit 220 , a medical information analysis unit 230 , a design unit 240 , and an output unit 250 .

데이터 입력부(210)는 분석 대상이 되는 데이터를 입력하기 위한 수단이다. 여기서, 데이터는 유전체 데이터 및 의료 정보 데이터를 포함한다. 의료 정보 데이터는 대상자의 의학적 검사의 결과물로서, 진단 차트, 영상(예: X-레이, CT(Computed Tomography), MRI(Magnetic Resonance Imaging), PET(positron emission tomography)The data input unit 210 is a means for inputting data to be analyzed. Here, the data includes genomic data and medical information data. Medical information data is the result of a subject's medical examination, and includes diagnostic charts, images (eg, X-rays, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET)).

등) 등을 포함할 수 있다. 유전체 데이터는 대상자(예: 환자)에 관련된 데이터 또는 질병의 원인이 되는 바이러스/균에 관련된 데이터 중 적어도 하나를 포함한다. 구체적으로, 유전체 데이터는 대상자 또는 바이러스/균의 DNA 또는 RNA 염기 서열에 대한 정보를 포함한다. 예를 들어, 데이터 입력부(210)는 외부 통신망을 통해 유전체 데이터를 수신하거나, 로컬 하드웨어(예: 메모리 포트, 사용자 입력 장치)를 통해 유전체 데이터를 입력받을 수 있다. 시스템을 이용하고자 하는 사용자는 데이터 입력부(210)를 통해 유전체 데이터를 입력할 수 있다. 이때, 유전체 데이터는 미리 정의된 형식에 따라 구성된 파일의 형태로 입력될 수 있다.etc.) and the like. The genomic data includes at least one of data related to a subject (eg, a patient) or data related to a virus/bacterium causing a disease. Specifically, the genomic data includes information on the DNA or RNA base sequence of a subject or virus/bacterium. For example, the data input unit 210 may receive dielectric data through an external communication network or receive dielectric data through local hardware (eg, a memory port or a user input device). A user who wants to use the system may input genome data through the data input unit 210 . In this case, the genome data may be input in the form of a file configured according to a predefined format.

유전 정보 분석부(220)는 데이터 입력부(210)를 통해 입력된 유전체 데이터를 분석한다. 유전 정보 분석부(220)는 유전체 데이터를 분석함으로써, 대상자 또는 바이러스/균의 염기 서열에 존재하는 유전적 특성을 확인하고, 염기 서열에 따라 생성될 수 있는 단백질 구조 또는 세포 내 활동을 추정할 수 있다. 이에 따라, 단백질 구조 생성부(226)는 대상자의 유전자로부터 기인하는 질병은 물론, 외부의 바이러스/균의 침투에 의해 발생하는 질병에 관련된 데이터를 획득할 수 있다. 이를 위해, 유전 정보 분석부(220)는 유전체 분석부(222), 분자 진단 분석부(224), 단백질 구조 생성부(226)를 포함한다.The genetic information analysis unit 220 analyzes the genome data input through the data input unit 210 . The genetic information analysis unit 220 may analyze the genomic data to determine the genetic characteristics present in the nucleotide sequence of the subject or virus/bacterium, and estimate the protein structure or intracellular activity that may be generated according to the nucleotide sequence. there is. Accordingly, the protein structure generating unit 226 may acquire data related to diseases caused by the infiltration of external viruses/bacteria as well as diseases resulting from the subject's gene. To this end, the genetic information analysis unit 220 includes a genome analysis unit 222 , a molecular diagnosis analysis unit 224 , and a protein structure generation unit 226 .

유전체 분석부(222)는 다양한 기법들을 이용하여 유전체 데이터를 분석할 수 있다. 유전체 분석부(222)는 대상자의 유전체 데이터를 분석함으로써 유전적 특성을 파악한다. 유전체 분석부(222)는 대상자의 유전체 상의 특정 염기 서열의 변이를 확인할 수 있다. 예를 들어, 유전체 분석부(222)는 SNP(Single Nucleotide Polymorphism) 분석법, SSCP(Single Strand Conformation Polymorphism) 분석법, AFLP(Amplified Fragment Length Polymorphism) 분석법, RAPD(Random Amplified Polymorphic DNAs) 분석법, AS-PCR(Allele-Specific PCR) 분석법, DASH(Dynamic Allele-Specific Hybridization) 분석법, WGS(Whole-Genome Sequencing) 분석법, NGS(Next Generation Sequencing) 분석법 등 다양한 분석 기법을 이용하여 유전체 데이터를 분석할 수 있다. The genome analyzer 222 may analyze the genome data using various techniques. The genome analyzer 222 analyzes the subject's genome data to determine the genetic characteristics. The genome analyzer 222 may check the variation of a specific nucleotide sequence on the subject's genome. For example, the genome analyzer 222 is a single nucleotide polymorphism (SNP) analysis method, a single strand conformation polymorphism (SSCP) analysis method, an Amplified Fragment Length Polymorphism (AFLP) analysis method, a Random Amplified Polymorphic DNAs (RAPD) analysis method, an AS-PCR ( Genomic data can be analyzed using a variety of analysis techniques, such as Allele-Specific PCR) analysis, Dynamic Allele-Specific Hybridization (DASH), Whole-Genome Sequencing (WGS), and Next Generation Sequencing (NGS).

분자 진단 분석부(224)는 유전체 데이터에 기반하여 분자 진단을 수행한다. 다시 말해, 분자 진단 분석부(224)는 유전체 데이터에 기반하여 대상자의 세포 내에서 발생할 수 있는 다양한 분자 수준의 활동을 파악한다. 구체적으로, 분자 진단 분석부(224)는 세포 내에서 일어나는 다양한 분자 수준의 변화를 수치나 영상을 통하여 검출한다. 예를 들어, 분자 진단 분석부(224)는 DNA나 RNA 등의 핵산 분석, 단백질 분석, 세포 내 대사체(metabolome) 분석 등을 수행할 수 있다.The molecular diagnosis analysis unit 224 performs molecular diagnosis based on the genome data. In other words, the molecular diagnostic analysis unit 224 identifies various molecular-level activities that may occur in the subject's cells based on the genomic data. Specifically, the molecular diagnostic analysis unit 224 detects changes in various molecular levels occurring in cells through numerical values or images. For example, the molecular diagnostic analysis unit 224 may perform nucleic acid analysis such as DNA or RNA, protein analysis, intracellular metabolome analysis, and the like.

단백질 구조 생성부(226)는 유전체 데이터에 기반하여 대상자의 유전자에 의해 생성될 수 있는 단백질의 구조를 예측한다. 즉, 단백질 구조 생성부(226)는 질병을 일으키는 단백질의 구조를 예측한다. 만일, 바이러스/균의 유전체 데이터가 확보된 경우, 단백질 구조 생성부(226)는 바이러스/균의 유전자에 의해 생성될 수 있는 단백질의 구조도 예측할 수 있다.The protein structure generating unit 226 predicts the structure of a protein that can be generated by the subject's gene based on the genomic data. That is, the protein structure generating unit 226 predicts the structure of the disease-causing protein. If the genome data of the virus/bacterium is secured, the protein structure generator 226 may also predict the structure of a protein that may be generated by the virus/bacterium gene.

전술한 바와 같이, 유전 정보 분석부(220)는 유전체 분석부(222), 분자 진단 분석부(224), 단백질 구조 생성부(226)를 포함한다. 유전 정보 분석부(220)에서 수행되는 분석 동작들은 실제 유전자가 아닌 유전체 데이터(예: 전자 파일)를 이용하여 수행된다. 이에 따라, 실제 유전자를 이용하는 분석에 비하여, 빠른 시간 내에, 적은 비용으로 분석이 이루어질 수 있다. 방대한 양의 유전체 데이터의 효과적인 분석을 위해, 유전 정보 분석부(220)는 머신 러닝(machine learning) 또는 딥러닝(deep learning) 기반의 인공 신경망(Artificial Neural Network, ANN)을 이용할 수 있다. 이 경우, 유전 정보 분석부(220)는 미리 학습된 인공 신경망을 이용하거나, 또는 초기 인공 신경망을 직접 학습한 후 사용할 수 있다.As described above, the genetic information analysis unit 220 includes a genome analysis unit 222 , a molecular diagnosis analysis unit 224 , and a protein structure generation unit 226 . Analysis operations performed by the genetic information analyzer 220 are performed using genomic data (eg, an electronic file) rather than an actual gene. Accordingly, as compared to an analysis using an actual gene, the analysis can be performed in a shorter time and at a low cost. For effective analysis of a vast amount of genome data, the genetic information analyzer 220 may use an artificial neural network (ANN) based on machine learning or deep learning. In this case, the genetic information analyzer 220 may use a pre-trained artificial neural network, or may use it after directly learning an initial artificial neural network.

의료 정보 분석부(230)는 의료 정보 데이터를 분석한다. 의료 정보 데이터를 분석함으로써, 의료 정보 분석부(230)는 대상자의 신체에 발생한 질병의 증상에 대한 정보를 획득할 수 있다. 의료 정보 데이터는 영상 데이터를 포함할 수 있다. 이 경우, 효과적인 분석을 위해, 의료 정보 분석부(230)는 영상의 품질 개선 또는 주요 정보를 강화하기 위해 영상을 필터링할 수 있다. 예를 들어, 필터링은 고속 푸리에 변환(fast Fourier transform, FFT), 히스토그램 평활화(histogram equalization), 모션 아티팩트(motion artifact) 제거 또는 노이즈 캔슬링(noise cancelling) 중 적어도 하나를 포함할 수 있다. 그리고, 의료 정보 분석부(230)는의료 영상에서 특징점들을 추출하고, 추출된 특징점들에 기반하여 증상을 판단하고, 판단된 증상을 데이터화할 수 있다. 즉, 의료 영상의 분석 결과에 기반하여, 의료 정보 분석부(230)는 대상자의 상태, 대상자의 감염 여부, 질병의 진행 상태, 질병의 중증도 등을 결정할 수 있다. The medical information analysis unit 230 analyzes medical information data. By analyzing the medical information data, the medical information analyzing unit 230 may acquire information on symptoms of diseases occurring in the subject's body. The medical information data may include image data. In this case, for effective analysis, the medical information analyzer 230 may filter the image to improve image quality or to enhance main information. For example, the filtering may include at least one of fast Fourier transform (FFT), histogram equalization, motion artifact removal, or noise canceling. In addition, the medical information analysis unit 230 may extract feature points from the medical image, determine a symptom based on the extracted feature points, and convert the determined symptom into data. That is, based on the analysis result of the medical image, the medical information analysis unit 230 may determine the state of the subject, whether the subject is infected, the progress of the disease, the severity of the disease, and the like.

설계부(240)는 데이터 입력부(210)를 통해 입력된 유전체 데이터 및 의료 정보, 유전 정보 분석부(220)에 의한 유전체 데이터의 분석 결과, 의료 정보 분석부(230)에 의한 의료 정보의 분석 결과에 기반하여 의약 물질로서 사용될 수 있는 화합물을 설계한다. 이를 위해, 설계부(240)는 특성 화합물 생성부(242), ADMET(Absorption, Distribution, Metabolism, Excretion, Toxicity) 예측부(244)를 포함한다.The design unit 240 is based on the genomic data and medical information input through the data input unit 210 , the analysis result of the genome data by the genetic information analysis unit 220 , and the analysis result of the medical information by the medical information analysis unit 230 . based on designing compounds that can be used as pharmaceutical substances. To this end, the design unit 240 includes a characteristic compound generation unit 242 and an ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity) prediction unit 244 .

특성 화합물 생성부(242)는 단백질 구조 생성부(226)에 의해 예측된 단백질의 구조에 대응하는 특성 화합물의 구조를 생성한다. 특성 화합물은 예측된 단백질과 상호 작용(예: 결합)할 수 있는 구조를 가지도록 생성되며, 예측된 단백질과 작용하여 질병을 억제할 수 있는 속성을 가지도록 생성된다.The characteristic compound generating unit 242 generates a structure of the characteristic compound corresponding to the protein structure predicted by the protein structure generating unit 226 . A characteristic compound is generated to have a structure capable of interacting with (eg, binding to) a predicted protein, and is generated to have a property capable of inhibiting disease by interacting with the predicted protein.

ADMET 예측부(244)는 특성 화합물 생성부(242)에 의해 생성된 화합물에 대한 흡수, 분포, 대사, 배설, 독성 검사를 수행한다. 다시 말해, ADMET 예측부(244)는 특성 화합물 생성부(242)에 의해 생성된 화합물의 ADMET 프로파일 및 약물작용 기전에 대해 분석한다. 즉, ADMET 예측부(244)는 확보된 유전체 데이터에 기반하여 특성 화합물 생성부(242)에 의해 생성된 화합물이 대상자의 세포 내에서 어떻게 작용하는지를 분석한다. 이때, ADMET 예측부(244)는 확보된 유전체 데이터에 기반하여 대상자의 특성에 대한 수학적 모델을 구축하고, 구축된 수학적 모델에 기반하여 ADMET 프로파일을 생성할 수 있다.The ADMET prediction unit 244 performs absorption, distribution, metabolism, excretion, and toxicity tests for the compound generated by the characteristic compound generation unit 242 . In other words, the ADMET prediction unit 244 analyzes the ADMET profile and drug action mechanism of the compound generated by the characteristic compound generation unit 242 . That is, the ADMET prediction unit 244 analyzes how the compound generated by the characteristic compound generation unit 242 acts in the subject's cells based on the obtained genomic data. At this time, the ADMET prediction unit 244 may build a mathematical model for the characteristics of the subject based on the obtained genomic data, and generate an ADMET profile based on the constructed mathematical model.

전술한 바와 같이, 설계부(240)는 특성 화합물 생성부(242), ADMET 예측부(244)를 포함한다. 설계부(240)에서 수행되는 분석 동작들은 실제 화합물이 아닌 화합물의 데이터(예: 전자 파일)를 이용하여 수행된다. 효과적인 설계를 위해, 유전 정보 분석부(220)는 머신 러닝 또는 딥러닝 기반의 인공 신경망을 이용할 수 있다. 이때, 인공 신경망의 입력으로서, 유전 정보 분석부(220)에 의해 생성된 유전체에 대한 분석 결과, 의료 정보 분석부(230)에 의해 생성된 의료 영상에 대한 분석 결과가 사용될 수 있다. 이에 따라, 실제 화합물을 이용하는 분석에 비하여, 빠른 시간 내에, 적은 비용으로 설계가 이루어질 수 있다. 이 경우, 설계부(240)는 미리 학습된 인공 신경망을 이용하거나, 또는 초기 인공 신경망을 직접 학습한 후 사용할 수 있다.As described above, the design unit 240 includes a characteristic compound generation unit 242 and an ADMET prediction unit 244 . Analysis operations performed by the design unit 240 are performed using data (eg, an electronic file) of a compound rather than an actual compound. For effective design, the genetic information analysis unit 220 may use an artificial neural network based on machine learning or deep learning. In this case, as an input of the artificial neural network, the analysis result of the genome generated by the genetic information analysis unit 220 and the analysis result of the medical image generated by the medical information analysis unit 230 may be used. Accordingly, compared to the analysis using an actual compound, the design can be made in a shorter time and at a low cost. In this case, the design unit 240 may use a pre-learned artificial neural network or may use the initial artificial neural network after directly learning it.

출력부(250)는 유전 정보 분석부(220), 의료 정보 분석부(230), 설계부(240)에 의해 생성된 결과물을 출력한다. 예를 들어, 출력부(250)는 결과물을 유형적인 사물(예: 인쇄물)로 출력하거나, 디지털 파일의 형태로 출력할 수 있다. 이때, 디지털 파일은 시스템 내의 저장 수단에 저장되거나, 외부 통신 망을 통해 미리 지정된 수신처(예: 지정된 이메일, 지정된 전화번호, 지정된 어플리케이션)로 송신될 수 있다.The output unit 250 outputs the results generated by the genetic information analysis unit 220 , the medical information analysis unit 230 , and the design unit 240 . For example, the output unit 250 may output the result as a tangible object (eg, a print) or in the form of a digital file. In this case, the digital file may be stored in a storage means within the system or transmitted to a pre-designated destination (eg, a designated e-mail, a designated phone number, a designated application) through an external communication network.

도 2를 참고하여 설명한 구조에서, 유전 정보 분석부(220), 설계부(240) 중 적어도 하나의 기능 및 데이터 입력부(210) 및 출력부(250)의 적어도 일부의 기능은 프로세서(processor)에 의해 수행될 수 있다. 또한, 도 2에 도시되지 아니하였으나, 시스템은 사용자와의 상호 작용(interaction)을 위한 인터페이스 수단을 포함할 수 있다. 예를 들어, 시스템은 키보드, 마우스, 터치스크린 등과 같은 입력 수단 및 모니터, 터치스크린, 프로젝터 등과 같은 표시 수단을 포함할 수 있다. 이하 설명에서, 사용자로부터의 입력 및 사용자에게 보여지는 인터페이스에 대한 설명은 입력 수단을 통해 입력되고, 표시 수단을 통해 표시되는 것으로 이해될 수 있다.In the structure described with reference to FIG. 2 , at least one function of the genetic information analysis unit 220 and the design unit 240 and at least some functions of the data input unit 210 and the output unit 250 are performed by a processor. can be performed. In addition, although not shown in FIG. 2 , the system may include an interface means for interaction with a user. For example, the system may include input means such as a keyboard, mouse, touch screen, and the like, and display means, such as a monitor, touch screen, projector, and the like. In the following description, it may be understood that the input from the user and the description of the interface shown to the user are inputted through the input means and displayed through the display means.

도 3은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템의 동작 절차를 나타내는 도면이다. 이하 설명의 편의를 위해 도 3의 동작들을 수행하는 주체는 '시스템'이라 지칭된다.3 is a diagram illustrating an operation procedure of a system for analyzing a genome and developing a drug substance according to an embodiment of the present invention. Hereinafter, for convenience of description, a subject performing the operations of FIG. 3 is referred to as a 'system'.

도 3을 참고하면, 301 단계에서, 시스템은 유전체 데이터 및 의료 정보를 획득한다. 획득되는 유전체 데이터는 대상자의 DNA 또는 RNA 염기 서열 데이터를 포함할 수 있다. 추가적으로, 질병의 원인이 되는 바이러스 또는 균의 DNA 또는 RNA 염기 서열 데이터가 더 포함될 수 있다.Referring to FIG. 3 , in step 301 , the system acquires genomic data and medical information. The obtained genomic data may include DNA or RNA nucleotide sequence data of the subject. Additionally, DNA or RNA nucleotide sequence data of viruses or bacteria that cause disease may be further included.

303 단계에서, 시스템은 유전체 데이터를 분석한다. 시스템은 유전체 데이터에 기반하여 다양한 유전체 분석 기법을 수행할 수 있다. 시스템은 유전체 데이터를 입력 값으로 삼아, 인공 신경망을 이용하여 유전체 데이터를 분석할 수 있다. 이를 위해, 시스템은 유전체 데이터를 인공 신경망에 입력 가능한 형태로 가공할 수 있다. 이를 통해, 시스템은 대상자 또는 바이러스/균의 유전적 특성, 분자 활동, 생성되는 단백질의 구조 등을 추정할 수 있다. 인공 신경망을 이용함으로 인해, 시스템은 실제의 화학 실험 없이도 유전체에 대한 분석 데이터를 획득할 수 있다.In step 303, the system analyzes the genomic data. The system may perform various genome analysis techniques based on the genome data. The system can use the genome data as an input value and analyze the genome data using an artificial neural network. To this end, the system can process the genomic data into a form that can be input to the artificial neural network. Through this, the system can estimate the genetic characteristics of the subject or virus/bacterium, molecular activity, structure of the produced protein, and the like. Due to the use of artificial neural networks, the system can acquire analytical data on the genome without actual chemical experiments.

305 단계에서, 시스템은 의료 정보를 분석한다. 특히, 일 실시 예에 따라, 시스템은 대상자의 의료 영상을 분석하고, 분석 결과에 기반하여 증상을 확인할 수 있다. 구체적으로, 시스템은 영상을 필터링한 후, 특징점들을 추출하고, 추출된 특징점들에 기반하여 증상을 판단할 수 있다. 이때, 시스템은 인공 신경망을 이용할 수 있다. 예를 들어, 시스템은 합성곱 신경망(convolutional neural network, CNN)을 이용하여 병변을 식별하고, 분류할 수 있다. 이때, 영상의 종류에 따라 다른 인공 신경망이 사용될 수 있다. In step 305, the system analyzes the medical information. In particular, according to an embodiment, the system may analyze a medical image of the subject and identify symptoms based on the analysis result. Specifically, the system may extract feature points after filtering the image, and determine a symptom based on the extracted feature points. In this case, the system may use an artificial neural network. For example, the system may use a convolutional neural network (CNN) to identify and classify lesions. In this case, a different artificial neural network may be used depending on the type of image.

307 단계에서, 시스템은 특성 화합물을 설계하고, 설계된 특성 화합물의 속성을 분석한다. 즉, 303 단계에서 수행된 분석 결과 및 305 단계에서 수행된 의료 정보에 대한 분석 결과에 기반하여, 시스템은 대상자 또는 바이러스/균에 특이적으로 반응하는 화합물을 설계한다. 예를 들어, 시스템은 대상자의 유전자에 의해 생성되어 질병을 일으키는 단백질과 상호 작용하는 화합물, 바이러스/균을 파괴하는 화합물, 바이러스/균에 의해 생성된 단백질의 활동을 억제하는 화합물, 바이러스/균에 의한 단백질 생성을 억제하는 화합물 중 적어도 하나를 설계할 수 있다. 설계된 화합물은 후보 물질, 후보 약물 등으로 지칭될 수 있다. 그리고, 시스템은 설계된 적어도 하나의 화합물에 대한 ADMET 프로파일을 생성한다. 즉, 시스템은 설계된 화합물이 대상자의 세포 내에서 어떻게 작용하는지 예측하기 위한 시뮬레이션을 수행할 수 있다. 다시 말해, 시스템은 후보 물질에 대한 유전체의 반응성 검사, 부작용 검사, 활성도 예측 등을 수행한다. 307 단계의 동작은 인공 신경망을 이용하는 인공 지능 기반으로 수행될 수 있다. 인공 신경망을 이용함으로 인해, 시스템은 실제의 화학 실험 없이도 화합물 및 화합물에 대한 분석 데이터를 획득할 수 있다.At step 307 , the system designs the characteristic compound and analyzes the properties of the designed characteristic compound. That is, based on the analysis result performed in step 303 and the analysis result of the medical information performed in step 305, the system designs a compound that specifically responds to the subject or virus/bacterium. For example, the system may contain a compound that interacts with a protein produced by a subject's gene that causes a disease, a compound that destroys a virus/bacterium, a compound that inhibits the activity of a protein produced by the virus/bacterium, a compound that inhibits the activity of a protein produced by the virus/bacterium At least one of the compounds that inhibit protein production by The designed compound may be referred to as a candidate substance, a candidate drug, and the like. And, the system generates an ADMET profile for the designed at least one compound. That is, the system can perform simulations to predict how the designed compound will behave within the subject's cells. In other words, the system performs a reactivity test of a genome to a candidate substance, a side effect test, and an activity prediction. The operation of step 307 may be performed based on artificial intelligence using an artificial neural network. Due to the use of artificial neural networks, the system can acquire compounds and analytical data on compounds without actual chemical experiments.

309 단계에서, 시스템은 분석 결과를 출력한다. 분석 결과는 유형물(예: 인쇄물)의 형태로 출력되거나, 디지털 파일의 형태로 출력될 수 있다. 이때, 디지털 파일은 시스템 내의 저장 수단에 저장되거나, 외부 통신 망을 통해 미리 지정된 수신처(예: 지정된 이메일, 지정된 전화번호, 지정된 어플리케이션)로 송신될 수 있다.In step 309, the system outputs the analysis result. The analysis result may be output in the form of a tangible object (eg, printed matter) or in the form of a digital file. In this case, the digital file may be stored in a storage means within the system or transmitted to a pre-designated destination (eg, a designated e-mail, a designated phone number, a designated application) through an external communication network.

도 3을 참고하여 설명된 동작들의 주체는 시스템으로 설명되었다. 하지만, 전술한 동작들은 하나의 장치(예: 로컬 장치(110a, 110b) 또는 서버(120))에 의해 수행되거나, 또는 둘 이상의 장치들에 의해 수행될 수 있다. 둘 이상의 장치에 의해 수행되는 경우, 일부 동작들 사이에, 필요한 정보가 둘 이상의 장치들 간 교환되는 동작이 추가될 수 있다. 예를 들어, 301 단계가 로컬 장치에 의해, 303 단계가 서버에 의해 수행되는 경우, 303 단계에 앞서 로컬 장치가 유전체 데이터를 서버로 전달하는 동작이 추가될 수 있다.The subject of the operations described with reference to FIG. 3 has been described as a system. However, the above-described operations may be performed by one device (eg, the local devices 110a and 110b or the server 120 ) or may be performed by two or more devices. When performed by two or more devices, between some operations, an operation in which necessary information is exchanged between the two or more devices may be added. For example, when step 301 is performed by a local device and step 303 is performed by a server, an operation in which the local device transmits genome data to the server prior to step 303 may be added.

도 3을 참고하여 설명한 실시 예에 따르면, 시스템은 특성 화합물을 설계하고, 설계된 화합물의 속성을 분석한다. 도 3에 도시되지 아니하였으나, 분석된 화합물의 속성이 의약 물질으로서의 기준 효과를 달성하지 못하거나, 또는 수용할 수 없는 부작용을 가지는 경우, 시스템은 해당 화합물을 후보 물질에서 제외할 수 있다. 이 경우, 시스템은 다른 화합물을 다시 설계하거나, 앞서 설계된 복수의 화합물들 중 다른 하나를 재선택한 후, 속성을 분석할 수 있다. 이러한 동작이 기준에 부합하는 화합물이 결정될 때까지 또는 미리 정해진 실패 조건이 만족될 때 까지 반복될 수 있다.According to the embodiment described with reference to FIG. 3 , the system designs a characteristic compound and analyzes the properties of the designed compound. Although not shown in FIG. 3 , if the properties of the analyzed compound do not achieve the reference effect as a medicinal substance or have unacceptable side effects, the system may exclude the compound from the candidate substance. In this case, the system may redesign another compound, or reselect another one of a plurality of previously designed compounds, and then analyze the properties. This operation may be repeated until a compound meeting the criteria is determined or a predetermined failure condition is satisfied.

시스템을 이용하여 질병 진단 및 분석, 의약 물질 개발을 수행하고자 하는 사용자는 로컬 장치를 이용하여 필요한 데이터를 입력하고, 분석 등을 명령할 수 있다. 이에 따라, 로컬 장치 또는 서버는 도 3을 참고하여 설명한 동작들을 수행함으로써, 사용자에게 결과물을 제공한다. 이때, 로컬 장치는 사용자의 데이터 입력, 명령 입력 등의 상호작용을 위해, 다양한 실시 예들에 따른 인터페이스를 제공할 수 있다. 즉, 로컬 장치는 표시 수단을 이용하여 다양한 인터페이스들을 표시할 수 있다. 인터페이스의 예들이 이하 도 4, 도 5를 참고하여 설명된다.A user who intends to perform disease diagnosis and analysis and drug substance development using the system may input necessary data using a local device and command analysis. Accordingly, the local device or server provides a result to the user by performing the operations described with reference to FIG. 3 . In this case, the local device may provide an interface according to various embodiments for interaction of a user's data input, command input, and the like. That is, the local device may display various interfaces using the display means. Examples of interfaces are described below with reference to FIGS. 4 and 5 .

도 4는 본 발명의 일 실시 예에 따른 유전체 분석 플랫폼에서 제공되는 유전체 분석 인터페이스의 예를 나타내는 도면이다. 도 4를 참고하면, 유전체 분석 인터페이스는 메뉴(410), 검색 바(420), 기능 항목들(430)을 포함한다. 기능 항목들(431 내지 439) 각각은 분석 대상 또는 기법을 표현한 이미지 및 명칭으로 구성될 수 있다. 예를 들어, 기능 항목들(430)은 후생 변이(epigenomics) 분석 항목(431), 엑솜(exome) 분석 항목(432), GWAS(genome-wide association study) 분석 항목(433), 대사체 분석(metabolomics) 항목(434), 범유전체 분석(metagenomics) 항목(435), 단백질 분석(proteomics) 항목(436), 타겟 시퀀싱(target sequencing) 항목(437), 전사체(transcriptome) 분석 항목(438), WGS(Whole-Genome Sequencing) 분석 항목(439) 중 적어도 하나를 포함한다. 사용자는 유전체 데이터를 입력한 후, 분석을 위해 활용하고자 하는 기능의 항목을 클릭 또는 선택함으로써, 해당 분석 기법을 진행시킬 수 있다.4 is a diagram illustrating an example of a genome analysis interface provided by the genome analysis platform according to an embodiment of the present invention. Referring to FIG. 4 , the genome analysis interface includes a menu 410 , a search bar 420 , and function items 430 . Each of the function items 431 to 439 may be composed of an image and a name expressing an analysis target or technique. For example, the function items 430 are epigenomics analysis items 431, exome analysis items 432, GWAS (genome-wide association study) analysis items 433, metabolite analysis ( metabolomics item (434), pangenomics item (435), protein analysis item (436), target sequencing item (437), transcriptome analysis item (438), and at least one of Whole-Genome Sequencing (WGS) analysis items 439 . After inputting the genome data, the user can proceed with the corresponding analysis technique by clicking or selecting the item of the function to be utilized for analysis.

도 4와 같은 인터페이스를 이용하면, 유전체 분석에 있어서, 추출된 유전체 데이터를 기반으로, 전장 유전체 분석, 마이크로 RNA 표적 유전자와 질환 연관성 예측 분석, 대량 전사체 분석, 유전자 상호작용 네트워크 분석 등 차세대 염기서열 분석 등을 위한 도구가 제공될 수 있다. 이러한 플랫폼은, 빅데이터 및 융합의 트랜드에 따른, 그리고 NGS 기술의 향상에 의해 매년 큰 폭으로 증가하는 데이터를 활용할 수 있게 하는, 질병 진단 및 의약 물질 개발과 IT의 융합 기술로 이해될 수 있다. 즉, 본 발명은 다양한 분야에서의 방대한 양의 유전체 정보를 분석 및 해석하기 위한 분석 소프트웨어에 기반한 것으로서, 다양한 연구자 또는 기업에게 이를 이용하여 다양한 산업에 활용할 수 있는 플랫폼을 제공할 수 있다.Using the interface as shown in FIG. 4, in the genome analysis, based on the extracted genome data, the next-generation nucleotide sequence such as full genome analysis, microRNA target gene and disease association prediction analysis, mass transcript analysis, gene interaction network analysis, etc. Tools for analysis and the like may be provided. Such a platform can be understood as a convergence technology of IT with disease diagnosis and drug substance development, which enables the utilization of data that is significantly increased every year according to the trend of big data and convergence and by the improvement of NGS technology. That is, the present invention is based on analysis software for analyzing and interpreting a vast amount of genomic information in various fields, and it is possible to provide a platform that can be used in various industries by using it to various researchers or companies.

도 5는 본 발명의 일 실시 예에 따른 의약 물질 개발 플랫폼에서 제공되는 의약 물질 개발 인터페이스의 예를 나타내는 도면이다. 도 5를 참고하면, 의약 물질 개발 인터페이스는 메뉴(510), 검색 바(520), 기능 항목들(530)을 포함한다. 기능 항목들(530) 각각은 분석 대상 또는 기법을 표현한 이미지 및 명칭으로 구성될 수 있다. 예를 들어, 기능 항목들(530)은 ADME(Absorption, Distribution, Metabolism, Excretion) 분석 항목(531), 기본 과학 조사B(Basic Science Research B) 항목(532), 바이오마커 개발(Biomarker development) 항목(533), 리드 식별(lead identification) 항목(534), 리드 최적화(lead optimization) 항목(535), 단백질 상호작용 분석(protein interaction analysis) 항목(536), 타겟 검증(target validation) 항목(537), 독성 분석(toxicity analysis) 항목(538) 중 적어도 하나를 포함한다. 사용자는 유전체 데이터에 대한 분석 결과가 도출된 후, 의약 물질 개발을 위해 활용하고자 하는 기능의 항목을 클릭 또는 선택함으로써, 해당 분석 기법을 진행시킬 수 있다.5 is a diagram illustrating an example of a pharmaceutical material development interface provided by the pharmaceutical material development platform according to an embodiment of the present invention. Referring to FIG. 5 , the drug substance development interface includes a menu 510 , a search bar 520 , and function items 530 . Each of the function items 530 may be composed of an image and a name expressing an analysis target or technique. For example, the function items 530 are ADME (Absorption, Distribution, Metabolism, Excretion) analysis item 531, Basic Science Research B item 532, and biomarker development item. (533), lead identification item (534), lead optimization item (535), protein interaction analysis item (536), target validation item (537) , and at least one of a toxicity analysis item 538 . After the analysis result for the genomic data is derived, the user can proceed with the analysis technique by clicking or selecting the function item to be utilized for drug substance development.

성장 잠재력을 갖추고 있는 제약 산업의 경우, 기존 신약 개발 사업들의 단계별 연계가 부족하고, 산업 현장과의 격차가 존재하여 문제점들이 발생하고 있으며, 가시적 성과 도출에 한계를 가지고 있다. 후보 물질 도출부터 최적화, 비임상, 임상 부문에서의 다양한 불확실성은 시장 실패 가능성을 높인다. 따라서, 본 발명은 인공 지능을 활용한 빅데이터 기반 플랫폼을 제공하고자 한다. 이를 통해, 개발자들은 신약 개발 과정에서의 효율성을 높일 수 있을 것이다.In the case of the pharmaceutical industry with growth potential, problems arise due to the lack of step-by-step linkage between existing new drug development projects and the existence of a gap with the industrial field, and there is a limit to deriving tangible results. Various uncertainties in the field of candidate substance derivation, optimization, non-clinical, and clinical fields increase the possibility of market failure. Accordingly, the present invention intends to provide a big data-based platform utilizing artificial intelligence. Through this, developers will be able to increase the efficiency in the process of developing new drugs.

전술한 바와 같이, 본 발명의 실시 예에 따른 시스템 및 플랫폼은 유전체를 분석하고, 분석 결과에 기반하여 의약 물질을 개발하기 위한 기능들을 제공한다. 이때, 분석 및 개발을 위한 연산 알고리즘은 실제 유전체 시료가 아닌 데이터를 이용하여 수행된다. 나아가, 분석 및 개발을 위한 연산 알고리즘은 인공 지능(Artificial Intelligence, AI)에 기반하여 구현될 수 있다.As described above, the system and platform according to an embodiment of the present invention provide functions for analyzing a genome and developing a pharmaceutical substance based on the analysis result. In this case, the calculation algorithm for analysis and development is performed using data, not an actual genome sample. Furthermore, a calculation algorithm for analysis and development may be implemented based on artificial intelligence (AI).

인공 지능은 인간의 지능으로 가능한 사고, 학습 및 분석 등을 컴퓨터 등의 기계가 수행하는 것을 의미한다. 최근 이러한 인공 지능을 의료 산업에 접목하는 기술이 증가하고 있는 실정이다. 인공 지능을 구현하기 위해, 인공 신경망이 널리 사용된다. 본 발명에 적용 가능한 인공 신경망의 일 예는 이하 도 6과 같다.Artificial intelligence means that machines such as computers perform thinking, learning, and analysis that are possible with human intelligence. Recently, the technology for applying artificial intelligence to the medical industry is increasing. To implement artificial intelligence, artificial neural networks are widely used. An example of an artificial neural network applicable to the present invention is shown in FIG. 6 below.

도 6은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템에 적용 가능한 인공 신경망의 구조를 나타내는 도면이다. 도 6을 참고하면, 인공 신경망은 입력 계층(input layer)(610), 적어도 하나의 은닉 계층(hidden layer)(620), 출력 계층(output layer)(630)으로 이루어진다. 계층들(610, 620, 630) 각각은 복수의 노드(node)들로 구성되어 있으며, 노드들 각각은 이전 계층에 속한 적어도 하나의 노드의 출력과 연결되어 있다. 각 노드는 이전 계층의 노드들의 각 출력 값과 그에 상응하는 연결 가중치(weight)를 내적(inner product)한 값에 바이어스(bias)를 더한 후, 비선형(non-linear)인 활성화 함수(activation function)와 곱한 출력 값을 다음 계층의 적어도 하나의 뉴런에게 전달한다. 6 is a diagram showing the structure of an artificial neural network applicable to a genome analysis and drug substance development system according to an embodiment of the present invention. Referring to FIG. 6 , the artificial neural network includes an input layer 610 , at least one hidden layer 620 , and an output layer 630 . Each of the layers 610 , 620 , and 630 includes a plurality of nodes, and each of the nodes is connected to the output of at least one node belonging to the previous layer. Each node adds a bias to the inner product of each output value of the nodes of the previous layer and the corresponding connection weight, and then a non-linear activation function The output value multiplied by is delivered to at least one neuron in the next layer.

본 발명의 다양한 실시 예에서 사용되는 인공 신경망 모델은 완전 합성곱 신경망(fully convolutional neural network), 합성곱 신경망(convolutional neural network), 순환 신경망(recurrent neural network), 제한 볼츠만 머신(restricted Boltzmann machine, RBM) 및 심층 신뢰 신경망(deep belief neural network, DBN) 중 적어도 하나를 포함할 수 있으나, 이에 한정되지 않는다. 또는, 딥러닝 이외의 머신 러닝 방법도 포함할 수 있다. 또는 딥러닝과 머신 러닝을 결합한 하이브리드 형태의 모델도 포함할 수 있다. 예컨대, 딥러닝 기반의 모델을 적용하여 영상의 특징을 추출하고, 상기 추출된 특징에 기초하여 영상을 분류하거나 인식할 때는 머신 러닝 기반의 모델을 적용할 수도 있다. 머신 러닝 기반의 모델은 서포트 벡터 머신(Support Vector Machine, SVM), 에이다부스트(AdaBoost) 등을 포함할 수 있으나, 이에 한정되지 않는다.Artificial neural network models used in various embodiments of the present invention include a fully convolutional neural network, a convolutional neural network, a recurrent neural network, and a restricted Boltzmann machine (RBM). ) and at least one of a deep belief neural network (DBN), but is not limited thereto. Alternatively, machine learning methods other than deep learning may be included. Alternatively, it may include a hybrid model that combines deep learning and machine learning. For example, when a feature of an image is extracted by applying a deep learning-based model, and an image is classified or recognized based on the extracted feature, a machine learning-based model may be applied. The machine learning-based model may include, but is not limited to, a support vector machine (SVM), an AdaBoost, and the like.

도 6의 예와 유사하게 구성된 인공 신경망이 사용될 수 있다. 이때, 본 발명의 일 실시 예에 따른 시스템은 복수의 인공 신경망들을 보유할 수 있다. 복수의 인공 신경망들은 기능에 따라 구분될 수 있다. 예를 들어, 복수의 인공 신경망들은 유전자 분석을 위한 적어도 하나의 인공 신경망, 의약 물질 개발을 위한 적어도 하나의 인공 신경망을 포함할 수 있다. 또한, 유전자 분석을 위한 적어도 하나의 인공 신경망은 대상자의 유전체 분석을 위한 적어도 하나의 인공 신경망, 바이러스/균의 유전체 분석을 위한 적어도 하나의 인공 신경망으로 구분될 수 있다. 나아가, 복수의 인공 신경망들은 분석 또는 개발을 위해 사용할 인공 신경망을 선택하기 위한 적어도 하나의 인공 신경망을 더 포함할 수 있다. An artificial neural network configured similarly to the example of FIG. 6 may be used. In this case, the system according to an embodiment of the present invention may have a plurality of artificial neural networks. The plurality of artificial neural networks may be classified according to functions. For example, the plurality of artificial neural networks may include at least one artificial neural network for gene analysis and at least one artificial neural network for drug substance development. In addition, the at least one artificial neural network for genetic analysis may be divided into at least one artificial neural network for analyzing the genome of a subject and at least one artificial neural network for analyzing the genome of a virus/bacterium. Furthermore, the plurality of artificial neural networks may further include at least one artificial neural network for selecting an artificial neural network to be used for analysis or development.

도 7은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템에 적용 가능한 인공 신경망들의 연결 구조를 나타내는 도면이다.7 is a diagram illustrating a connection structure of artificial neural networks applicable to a genome analysis and drug substance development system according to an embodiment of the present invention.

도 7을 참고하면, 시스템은 유전체 분석을 위한 복수의 인공 신경망들을 포함하는 제1 인공 신경망 집합(710), 의료 영상 데이터 분석을 위한 복수의 인공 신경망들을 포함하는 제2 인공 신경망 집합(720) 및 의약 물질 개발을 위한 복수의 인공 신경망들을 포함하는 제3 인공 신경망 집합(730)을 포함한다.Referring to FIG. 7 , the system includes a first artificial neural network set 710 including a plurality of artificial neural networks for genome analysis, a second artificial neural network set 720 including a plurality of artificial neural networks for medical image data analysis, and and a third artificial neural network set 730 including a plurality of artificial neural networks for drug substance development.

제1 인공 신경망 집합(710)에 포함되는 복수의 인공 신경망들은 기능(예: 분석 방법, 분석 대상이 대상자인지 바이러스/균인지 여부)에 따라 분류될 수 있다. 예를 들어, 제1 인공 신경망 집합(710)은 도 4에 예시된 항목들(431 내지 408) 각각에 대응되는 인공 신경망들을 포함할 수 있다. 또한, 제1 인공 신경망 집합(710)에 포함되는 복수의 인공 신경망들은 서로 다른 구조를 가질 수 있다. 구체적으로, 복수의 인공 신경망들은 입력 노드의 개수, 입력 값들의 형태 등에 있어서 서로 다를 수 있다. 따라서, 시스템은 사용자 인터페이스를 통해 사용자에 의한 명령을 확인하고, 확인된 명령에 대응하는 인공 신경망에서 요구되는 입력 값들의 형태에 따라 입력된 유전체 데이터를 가공할 수 있다.The plurality of artificial neural networks included in the first artificial neural network set 710 may be classified according to functions (eg, an analysis method, whether the analysis target is a target or a virus/bacterium). For example, the first artificial neural network set 710 may include artificial neural networks corresponding to each of the items 431 to 408 illustrated in FIG. 4 . Also, a plurality of artificial neural networks included in the first artificial neural network set 710 may have different structures. Specifically, the plurality of artificial neural networks may be different from each other in the number of input nodes, types of input values, and the like. Accordingly, the system may confirm the command by the user through the user interface, and process the input genome data according to the types of input values required by the artificial neural network corresponding to the confirmed command.

제2 인공 신경망 집합(720)에 포함되는 복수의 인공 신경망들은 기능(예: 분석 방법, 분석 대상이 대상자인지 바이러스/균인지 여부)에 따라 분류될 수 있다. 예를 들어, 제2 인공 신경망 집합(720)은 의료 영상의 종류(예: X-레이, CT, MRI)에 대응되는 인공 신경망들을 포함할 수 있다. 또한, 제2 인공 신경망 집합(720)에 포함되는 복수의 인공 신경망들은 서로 다른 구조를 가질 수 있다. 구체적으로, 복수의 인공 신경망들은 입력 노드의 개수, 입력 값들의 형태 등에 있어서 서로 다를 수 있다. 따라서, 시스템은 사용자 인터페이스를 통해 사용자에 의한 명령을 확인하고, 확인된 명령에 대응하는 인공 신경망에서 요구되는 입력 값들의 형태에 따라 입력된 의료 영상 데이터를 가공할 수 있다.A plurality of artificial neural networks included in the second artificial neural network set 720 may be classified according to functions (eg, an analysis method, whether an analysis target is a target or a virus/bacterium). For example, the second artificial neural network set 720 may include artificial neural networks corresponding to the type of medical image (eg, X-ray, CT, or MRI). Also, a plurality of artificial neural networks included in the second artificial neural network set 720 may have different structures. Specifically, the plurality of artificial neural networks may be different from each other in the number of input nodes, types of input values, and the like. Accordingly, the system may confirm a command by the user through the user interface, and process the input medical image data according to the types of input values required by the artificial neural network corresponding to the confirmed command.

제3 인공 신경망 집합(730)에 포함되는 복수의 인공 신경망들은 기능(예: 분석 방법, 분석 대상이 대상자인지 바이러스/균인지 여부)에 따라 분류될 수 있다. 예를 들어, 제3 인공 신경망 집합(730)은 도 5에 예시된 항목들(531 내지 509) 각각에 대응되는 인공 신경망들을 포함할 수 있다. 또한, 제3 인공 신경망 집합(730)에 포함되는 복수의 인공 신경망들은 서로 다른 구조를 가질 수 있다. 구체적으로, 복수의 인공 신경망들은 입력 노드의 개수, 입력 값들의 형태 등에 있어서 서로 다를 수 있다. 따라서, 시스템은 사용자 인터페이스를 통해 사용자에 의한 명령을 확인하고, 확인된 명령에 대응하는 인공 신경망에서 요구되는 입력 값들의 형태 및 앞서 수행된 분석 결과의 데이터 형태에 따라 입력되는 유전체 분석 결과 데이터 및 의료 영상 분석 결과 데이터를 가공할 수 있다.A plurality of artificial neural networks included in the third artificial neural network set 730 may be classified according to functions (eg, an analysis method, whether the analysis target is a target or a virus/bacterium). For example, the third artificial neural network set 730 may include artificial neural networks corresponding to each of the items 531 to 509 illustrated in FIG. 5 . Also, a plurality of artificial neural networks included in the third artificial neural network set 730 may have different structures. Specifically, the plurality of artificial neural networks may be different from each other in the number of input nodes, types of input values, and the like. Therefore, the system confirms the command by the user through the user interface, and the genomic analysis result data and medical Image analysis result data can be processed.

제2 인공 신경망 집합(720)에 포함되는 인공 신경망들은 의료 영상 데이터를 이용하여 학습될 수 있다. 의료 영상 데이터는 개인 정보 보호 등의 이유에 의해 충분한 학습 데이터를 확보하기 어려울 수 있다. 따라서, 제2 인공 신경망 집합(720)에 포함되는 인공 신경망들은 데이터 증강(data argumentation)을 통해 확보된 학습 데이터를 이용하여 학습될 수 있다. 특히, 데이터 증강은 의료 영상 데이터 내의 병변에 대한 데이터에 대해 수행될 수 있다. 즉, 시스템은 입력되는 의료 영상 데이터에서 병변 영역을 추출하고, 추출된 병변 영역에 대한 데이터에 대하여 데이터 증강을 수행함으로써 학습 데이터를 확보할 수 있다. 예를 들어, 데이터 증강은 회전(rotation), 스케일링(scaling) 등을 포함할 수 있다.The artificial neural networks included in the second artificial neural network set 720 may be learned using medical image data. For medical image data, it may be difficult to secure sufficient learning data for reasons such as protection of personal information. Accordingly, the artificial neural networks included in the second artificial neural network set 720 may be learned using learning data secured through data argumentation. In particular, data augmentation may be performed on data on lesions in medical image data. That is, the system can secure the learning data by extracting a lesion region from the input medical image data and performing data augmentation on the extracted lesion region data. For example, data augmentation may include rotation, scaling, and the like.

일 실시 예에 따라, 제3 인공 신경망 집합(730)은 특성 화합물을 설계하기 위한 인공 신경망(이하 '화합물 설계 모델'이라 칭함)을 포함한다. 화합물 설계 모델은 제1 인공 신경망 집합(710)에 포함된 인공 신경망들 중 적어도 하나로부터 추론(inference)된 결과를 입력으로서 사용한다. 또한, 화합물 설계 모델은 다른 화합물(예: 최종 화합물로서 선택되지 못하고 후보에서 탈락한 화합물)의 속성 분석 결과(예: ADMET 프로파일)를 입력으로서 사용할 수 있다. 또한, 화합물 설계 모델은 외부로부터 수집된 단백질 대 화합물의 상호 작용에 대한 데이터를 입력으로서 사용할 수 있다.According to an embodiment, the third artificial neural network set 730 includes an artificial neural network (hereinafter referred to as a 'compound design model') for designing a characteristic compound. The compound design model uses a result inferred from at least one of the artificial neural networks included in the first artificial neural network set 710 as an input. In addition, the compound design model can use as input the results of attribute analysis (eg, ADMET profile) of other compounds (eg, compounds that were not selected as final compounds and were rejected). In addition, the compound design model can use as input data on protein-to-compound interactions collected from the outside.

이를 위해, 시스템은 데이터 크롤링(crawling)을 수행하는 데이터 수집부(미도시)를 포함할 수 있다. 데이터 수집부는 유전체 분석을 통해 얻어진 단백질의 구조에 관련된 화합물에 대한 정보를 외부 데이터 망을 이용하여 수집하고, 화합물 설계 모델에서 사용 가능하도록 제공한다. 예를 들어, 데이터 수집부는 주기적으로 또는 이벤트 기반으로 웹 검색, 학술 데이터 베이스 검색을 수행함으로써, 화합물에 대한 정보를 수집할 수 있다. 수집된 정보는 가공을 거쳐, 화합물 설계 모델의 입력 노드들로 입력된다.To this end, the system may include a data collection unit (not shown) that performs data crawling. The data collection unit collects information on compounds related to the structure of proteins obtained through genome analysis using an external data network and provides them to be used in the compound design model. For example, the data collection unit may collect information about the compound by periodically or event-based web search or academic database search. The collected information is processed and input to the input nodes of the compound design model.

이때, 검색 범위는 유전체 분석을 통해 얻어진 단백질의 구조에 대한 학습 이력에 기반하여 달라질 수 있다. 예를 들어, 얻어진 단백질의 구조 또는 이와 유사한 구조가 학습 데이터로서 사용되어 학습을 수행한 경험이 존재하는 경우, 화합물 설계 모델을 통한 추론의 정확도가 높을 것으로 기대할 수 있다. 따라서, 시스템은 얻어진 단백질 구조를 과거 학습 이력 데이터에서 검색하고, 검색 결과에 따라 크롤링을 통해 수집할 데이터의 양 또는 범위를 결정할 수 있다. 구체적으로, 시스템은 동일 또는 유사한 단백질 구조가 학습된 경험이 존재하는지 여부, 학습 경험을 가진 단백질 구조와 유사한 정도에 기반하여 데이터 검색을 위한 키워드의 개수, 키워드의 내용, 수집할 데이터의 크기 중 적어도 하나를 결정한다. 여기서, 단백질 구조의 유사한 정도는 단백질을 구성하는 아미노산의 종류, 연결 구조, 크기 등에 기반하여 판단될 수 있다. 이를 위해, 시스템은 인공 신경망들(예: 제1 인공 신경망 집합(710), 제3 인공 신경망 집합(730) 내의 인공 신경망들)의 학습에 사용된 학습 데이터에 대한 정보를 저장한 데이터 베이스(미도시)를 포함할 수 있다. 전술한 데이터 수집부는 학습 데이터를 수집하는 기능도 수행할 수 있다.In this case, the search range may vary based on the learning history of the protein structure obtained through genome analysis. For example, if there is experience in which the obtained protein structure or similar structure is used as learning data to perform learning, it can be expected that the accuracy of inference through the compound design model is high. Therefore, the system can search the obtained protein structure in the past learning history data, and determine the amount or range of data to collect through crawling according to the search result. Specifically, the system determines whether at least the number of keywords for data retrieval, the content of the keyword, and the size of data to be collected, based on whether there is an experience in which the same or similar protein structure has been learned, and the degree of similarity to the protein structure with the learning experience. decide one Here, the degree of similarity of the protein structure may be determined based on the type, linkage structure, size, etc. of amino acids constituting the protein. To this end, the system is a database (not shown) storing information on learning data used for learning of artificial neural networks (eg, artificial neural networks in the first artificial neural network set 710 and the third artificial neural network set 730). city) may be included. The above-described data collection unit may also perform a function of collecting learning data.

전술한 실시 예들과 같이, 본 발명의 실시 예에 따른 시스템은 입력되는 유전체 데이터에 기반하여 유전체 분석, 분자 진단, 단백질 구조 예측, 특성 화합물 설계, 화합물 속성 검사 등을 수행한다. 이때, 입력되는 유전체 데이터의 양은 방대하기 때문에, 연산의 효율을 높이고자, 시스템은 단계적으로 알고리즘의 복잡도를 높이면서 필요한 기능들을 수행할 수 있다.As in the above-described embodiments, the system according to an embodiment of the present invention performs genome analysis, molecular diagnosis, protein structure prediction, characteristic compound design, compound property inspection, etc. based on input genome data. At this time, since the amount of input genome data is huge, in order to increase the efficiency of calculation, the system can perform necessary functions while increasing the complexity of the algorithm step by step.

일 실시 예에 따라, 시스템은 동일한 기능을 수행할 수 있는 복수의 인공 신경망들을 이용할 수 있다. 예를 들어, 단백질 분석(proteomics)을 수행하기 위한 복수의 인공 신경망들이 존재할 수 있고, 복수의 인공 신경망들은 서로 다른 구조를 가짐으로써 추론 정확도 및 연산 복잡도/시간에서 차이를 가질 수 있다. 이 경우, 시스템은 초기에 설정된 정확도 및 연산 복잡도/시간에 부합하는 하나의 인공 신경망으로 연산을 추론을 수행한 후, 결과가 수렴하지 아니하면 보다 높은 추론 정확도를 가지는 다음 순서의 인공 신경망을 이용하여 다시 추론을 수행할 수 있다. 여기서, 결과의 수렴 여부는 사용자로부터의 피드백(예: 결과에 대한 평가 입력), 판단된 속성에 기반한 미리 정의된 기준 충족 여부(예: 수치화된 속성 값 및 임계치의 비교)에 기반하여 판단될 수 있다. 여기서, 판단된 속성에 기반한 미리 정의된 기준 충족 여부는 설계부(240)에서 유전 정보 분석부(220)로 피드백될 수 있다.According to an embodiment, the system may use a plurality of artificial neural networks capable of performing the same function. For example, there may be a plurality of artificial neural networks for performing protein analysis (proteomics), and the plurality of artificial neural networks may have different structures in reasoning accuracy and computational complexity/time. In this case, the system infers the operation with one artificial neural network that matches the initially set accuracy and computational complexity/time, and if the result does not converge, then using the next artificial neural network with higher inference accuracy. Inference can be performed again. Here, the convergence of the results may be determined based on feedback from the user (eg, input of evaluation for the result), whether a predefined criterion is met based on the determined attribute (eg, comparison of quantified attribute values and thresholds). there is. Here, whether a predefined criterion is satisfied based on the determined attribute may be fed back from the design unit 240 to the genetic information analysis unit 220 .

이때, 누적된 추론 결과에 기반하여, 시스템은 초기의 정확도 및 연산 복잡도/시간을 기능 별로 다르게 설정할 수 있다. 또한, 시스템은 사용자가 수행하기를 원하는 기능들의 개수에 따라 정확도 및 연산 시간의 초기 값을 조정할 수 있다. 또는, 시스템은 사용자로부터 정확도 및 연산 시간에 대한 기초 데이터를 입력받고, 입력된 기초 데이터에 기반하여 초기 값을 결정 및 적용할 수 있다.In this case, based on the accumulated reasoning result, the system may set initial accuracy and computational complexity/time differently for each function. In addition, the system may adjust the initial values of accuracy and computation time according to the number of functions the user wants to perform. Alternatively, the system may receive basic data for accuracy and calculation time from a user, and determine and apply an initial value based on the input basic data.

전술한 실시 예들과 같이, 본 발명의 실시 예에 따른 시스템은 인공 신경망을 이용하여 분석 및 개발에 필요한 기능들을 수행한다. 인공 신경망을 이용한 정확한 추론을 위해, 인공 신경망이 적절히 학습된 상태일 것이 요구된다. 이를 위해, 시스템은 학습부를 더 포함할 수 있다. 다만, 본 발명의 실시 예에 따른 시스템 및 플랫폼의 초기 단계의 경우, 학습량의 부족으로 인해 정확한 추론이 기대되기 어려울 수 있다. 특히, 서버 기반이 아닌 로컬 장치 기반으로 시스템 및 플랫폼이 구축된 경우, 학습의 부족으로 인한 추론의 정확도 저하는 더 클 수 있다.As in the above-described embodiments, the system according to an embodiment of the present invention performs functions necessary for analysis and development using an artificial neural network. For accurate inference using an artificial neural network, it is required that the artificial neural network is in a properly trained state. To this end, the system may further include a learning unit. However, in the initial stage of the system and platform according to an embodiment of the present invention, it may be difficult to expect accurate inference due to a lack of learning amount. In particular, when a system and a platform are built based on a local device rather than a server-based one, the decrease in accuracy of inference due to lack of learning may be greater.

이를 해소하기 위한 방안으로서, 전이 학습(transfer learning)이 적용될 수 있다. 전이 학습은 이미 훈련된 다른 인공 신경망의 변수(예: 가중치 값)를 해당 인공 신경망에 적용하는 학습 방법이다. 즉, 시스템의 초기 설치 과정 중, 사용자의 명령 또는 주어진 조건의 만족에 따라, 본 발명의 일 실시 예에 따른 다른 시스템의 인공 신경망을 이용한 전이 학습이 수행될 수 있다. 전이 학습은 이하 도 8과 같은 절차를 통해 수행될 수 있다.As a method to solve this problem, transfer learning may be applied. Transfer learning is a learning method that applies variables (eg, weight values) from another artificial neural network that have already been trained to the corresponding artificial neural network. That is, during the initial installation process of the system, according to a user's command or satisfaction of a given condition, transfer learning using an artificial neural network of another system according to an embodiment of the present invention may be performed. Transfer learning may be performed through a procedure as shown in FIG. 8 below.

도 8은 본 발명의 일 실시 예에 따른 유전체 분석 및 의약 물질 개발 시스템에서 전이 학습을 수행하는 절차를 나타내는 도면이다. 도 8에서, 시스템A 및 시스템B 각각은 본 발명의 실시 예들에 따른 플랫폼을 제공하는 독립적인 시스템들이며, 로컬 장치 기반 또는 서버 기반으로 구축된 시스템일 수 있다.8 is a diagram illustrating a procedure for performing transfer learning in a genome analysis and drug substance development system according to an embodiment of the present invention. In FIG. 8 , each of system A and system B is independent systems that provide a platform according to embodiments of the present invention, and may be a system built on a local device basis or a server basis.

도 8을 참고하면, 801 단계에서, 시스템B는 적어도 하나의 학습된 인공 신경망을 공유 가능한 상태로 등록한다. 즉, 시스템B의 사용자는 학습된 인공 신경망이 다른 시스템(예: 시스템A)의 전이 학습을 위해 이용되는 것을 허용함을 등록한다. 도 8에 도시되지 아니하였으나, 등록 여부에 대한 정보는 별도의 지원 서버에서 저장될 수 있다. 등록을 위해, 시스템B는 표시 수단을 통해 인공 신경망들의 목록을 표시하고, 사용자에 의한 선택 및 등록 명령을 확인한 후, 등록을 위한 처리(예: 공유 허용 목록 작성 또는 별도의 지원 서버로 통지)를 수행할 수 있다.Referring to FIG. 8 , in step 801, the system B registers at least one learned artificial neural network in a sharable state. That is, the user of system B registers allowing the learned artificial neural network to be used for transfer learning of other systems (eg, system A). Although not shown in FIG. 8 , information on whether to register may be stored in a separate support server. For registration, system B displays a list of artificial neural networks through a display means, confirms selection and registration commands by the user, and then carries out the processing for registration (e.g., creating a shared whitelist or notifying a separate support server). can be done

803 단계에서, 시스템A는 공유 정보에 대한 요청을 송신한다. 다시 말해, 시스템A는 공유 가능한 적어도 하나의 인공 신경망에 대한 정보를 요청한다. 805 단계에서, 시스템B는 공유 정보를 송신한다. 공유 정보는 공유 가능한 적어도 하나의 인공 신경망에 대한 목록을 포함한다. 목록은 공유 가능한 적어도 하나의 인공 신경망에 구조에 대한 정보, 기능에 대한 정보, 학습량에 대한 정보를 포함할 수 있다.In step 803, system A sends a request for shared information. In other words, system A requests information on at least one sharable artificial neural network. In step 805, the system B transmits the shared information. The shared information includes a list of at least one sharable artificial neural network. The list may include information on a structure, information on a function, and information on a learning amount in at least one shareable artificial neural network.

807 단계에서, 시스템A는 전이 학습에 이용할 적어도 하나의 인공 신경망을 선택한다. 즉, 시스템A는 시스템B로부터 제공된 공유 정보에 포함된 인공 신경망들 중 시스템A에서 학습하고자 하는 인공 신경망의 전이 학습에 활용할 적어도 하나의 인공 신경망을 선택한다. 예를 들어, 시스템A는 학습하고자 하는 인공 신경망과 동일한 기능을 가지는 시스템B의 인공 신경망을 선택할 수 있다. 도 8의 예의 경우, 공유 정보는 시스템B로부터만 수신되었으나, 시스템A는 다른 시스템들로부터 공유 정보를 수신하고, 복수의 시스템들로부터 제공된 공유 정보를 종합적으로 고려하여 전이 학습에 이용할 인공 신경망을 선택할 수 있다. In step 807, the system A selects at least one artificial neural network to use for transfer learning. That is, the system A selects at least one artificial neural network to be used for transfer learning of the artificial neural network to be learned in the system A from among the artificial neural networks included in the shared information provided from the system B. For example, the system A may select the artificial neural network of the system B having the same function as the artificial neural network to be learned. In the example of FIG. 8 , shared information was received only from system B, but system A receives shared information from other systems and selects an artificial neural network to be used for transfer learning by comprehensively considering the shared information provided from a plurality of systems. can

809 단계에서, 시스템A는 선택된 인공 신경망에 관련된 정보에 대한 요청을 송신한다. 811 단계에서, 시스템B는 요청된 인공 신경망에 관련된 정보를 송신한다. 인공 신경망에 관련된 정보는 전이 학습에 필요한 정보(예: 가중치 값들)를 포함할 수 있다.In step 809, the system A transmits a request for information related to the selected artificial neural network. In step 811, the system B transmits information related to the requested artificial neural network. Information related to the artificial neural network may include information necessary for transfer learning (eg, weight values).

813 단계에서, 시스템A는 전이 학습을 수행한다. 다시 말해, 시스템A는 시스템B로부터 제공된 인공 신경망에 관련된 정보에 기반하여 전이 학습을 수행한다. 이때, 시스템A는 수신된 정보에 포함되는 가중치 값들 전부를 그대로 자신의 인공 신경망에서 재사용하거나, 일부만 선택적으로 재사용할 수 있다. 일부의 재사용 여부는 시스템B로부터 제공된 인공 신경망의 학습량, 구조의 유사성 등에 기반하여 판단될 수 있다.In step 813, the system A performs transfer learning. In other words, system A performs transfer learning based on information related to the artificial neural network provided from system B. In this case, the system A may reuse all of the weight values included in the received information in its own artificial neural network as it is, or selectively reuse only some of the weight values. Whether or not to reuse a part may be determined based on the amount of learning of the artificial neural network provided from the system B, the similarity of the structure, and the like.

도 8을 참고하여 설명한 실시 예와 같이, 서로 다른 시스템들 간 상호 작용을 통해 전이 학습이 수행될 수 있다. 이때, 다른 실시 예에 따라, 시스템들 간 상호 작용을 지원하는 별도의 지원 시스템이 존재할 수 있다. 이 경우, 각 시스템은 지원 시스템에 공유를 허용하는 인공 신경망들에 대한 정보를 등록하고, 전이 학습을 희망하는 시스템은 지원 시스템과 요청-응답 시그널링을 수행함으로써, 전이 학습에 필요한 데이터 및 정보를 획득할 수 있다. As in the embodiment described with reference to FIG. 8 , transfer learning may be performed through interaction between different systems. In this case, according to another embodiment, a separate support system supporting interaction between systems may exist. In this case, each system registers information about artificial neural networks that allow sharing in the support system, and the system that wants transfer learning performs request-response signaling with the support system to obtain data and information necessary for transfer learning. can do.

나아가, 지원 시스템은 해당 시스템(예: 시스템A)의 전이 학습 이후에 타 시스템(예: 시스템B)에서의 인공 신경망의 학습 수행에 대한 정보를 제공하고, 정보를 수신한 시스템(예: 시스템A)은 인터페이스를 통해 타 시스템에서의 인공 신경망의 추가 학습 진행에 대한 정보를 사용자에게 표시할 수 있다. 예를 들어, 시스템은 별도의 알림(notification)을 통해 실시간으로 추가 학습 진행에 대한 정보를 표시하거나, 또는 해당 기능의 수행이 명령된 경우에 추가 학습 진행에 대한 정보를 표시할 수 있다. 이에 따라, 사용자는 다시 전이 학습을 수행할 것을 요청할 수 있으므로, 시스템들 간 상호 작용이 증대될 수 있다. 다른 실시 예에 따라, 인공 신경망의 학습 수행에 대한 정보는 타 시스템으로부터 직접 수신될 수도 있다.Furthermore, the support system provides information on the learning performance of the artificial neural network in another system (eg, system B) after transfer learning of the corresponding system (eg, system A), and the system that received the information (eg, system A) ) can display information about the further learning progress of the artificial neural network in other systems to the user through the interface. For example, the system may display information on the progress of additional learning in real time through a separate notification, or display information on the progress of additional learning when execution of a corresponding function is commanded. Accordingly, since the user may request to perform transfer learning again, interaction between the systems may be increased. According to another embodiment, information on the learning performance of the artificial neural network may be directly received from another system.

본 발명의 예시적인 방법들은 설명의 명확성을 위해서 동작의 시리즈로 표현되어 있지만, 이는 단계가 수행되는 순서를 제한하기 위한 것은 아니며, 필요한 경우에는 각각의 단계가 동시에 또는 상이한 순서로 수행될 수도 있다. 본 발명에 따른 방법을 구현하기 위해서, 예시하는 단계에 추가적으로 다른 단계를 포함하거나, 일부의 단계를 제외하고 나머지 단계를 포함하거나, 또는 일부의 단계를 제외하고 추가적인 다른 단계를 포함할 수도 있다.Exemplary methods of the present invention are expressed as a series of actions for clarity of description, but this is not intended to limit the order in which the steps are performed, and each step may be performed simultaneously or in a different order if necessary. In order to implement the method according to the present invention, other steps may be included in addition to the illustrated steps, steps may be excluded from some steps, and/or other steps may be included except for some steps.

본 발명의 다양한 실시 예는 모든 가능한 조합을 나열한 것이 아니고 본 발명의 대표적인 양상을 설명하기 위한 것이며, 다양한 실시 예에서 설명하는 사항들은 독립적으로 적용되거나 또는 둘 이상의 조합으로 적용될 수도 있다.Various embodiments of the present invention do not list all possible combinations but are intended to illustrate representative aspects of the present invention, and the details described in various embodiments may be applied independently or in combination of two or more.

또한, 본 발명의 다양한 실시 예는 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 그들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 범용 프로세서(general processor), 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. In addition, various embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof. For implementation by hardware, one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose It may be implemented by a processor (general processor), a controller, a microcontroller, a microprocessor, and the like.

본 발명의 범위는 다양한 실시 예의 방법에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어 또는 머신-실행가능한 명령들(예를 들어, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 또는 명령 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 비-일시적 컴퓨터-판독가능 매체(non-transitory computer-readable medium)를 포함한다. The scope of the present invention includes software or machine-executable instructions (eg, operating system, application, firmware, program, etc.) that cause operation according to the method of various embodiments to be executed on a device or computer, and such software or and non-transitory computer-readable media in which instructions and the like are stored and executable on a device or computer.

Claims

In a system that provides a platform for analyzing genome and medical information and developing a drug substance,
an input unit for obtaining first data representing a nucleotide sequence of a genome and second data including medical information;
The first data is analyzed through at least one of genomic analysis methods to confirm the mutation of a specific sequence on the genome, and based on the first data, an activity analysis at the intracellular level is performed, and for the first data a first analysis unit for predicting a structure of a disease-causing protein generated by a protein-encoding gene in the genome based on the analysis result;
a second analysis unit for confirming information on symptoms occurring in the subject by analyzing the second data;
a collection unit that collects data on the compound related to the structure of the protein and provides it to the design unit;
Generating the structure of a compound for use as a pharmaceutical substance by determining the structures of the compound capable of interacting with the structure of the protein based on the structure of the protein, the analysis result for the first data, and information on the symptom, Analyze the properties of the compound, including its pharmacokinetics, through simulations to predict the action of the structure of the compound in the cells of the subject and absorption, distribution, metabolism, excretion and toxicity tests for the structure of the compound, and , the design unit excluding the structure of the compound from a candidate material when the property of the compound has a side effect based on a result of analyzing the property of the compound; and
and an output unit for outputting analysis results for the first data and the second data and information related to structures and properties of the remaining compounds not excluded from the candidate material,
Analysis of molecular level activity in the cell includes nucleic acid analysis, protein analysis, and intracellular metabolite analysis of DNA (deoxyribonucleic acid) or RNA (ribonucleic acid),
The first analysis unit, the second analysis unit, and the design unit operate using at least one artificial neural network,
The collection unit crawls the data on the protein-to-compound interaction by performing a web search and an academic database search through an external communication network, and processes the crawled data according to the structure of the input layer of the artificial neural network, ,
The collection range and amount of the crawled data is determined by whether there is an experience in which a protein structure identical or similar to the structure of the compound exists, and if the learned experience exists, between the same or similar protein structure and the structure of the compound. It is determined based on the similarity,
The degree of similarity is determined based on at least one of the type, linkage structure and size of amino acids constituting the same or similar protein structure,
The output unit, epigenomics analysis item, exome analysis item, GWAS (genome-wide association study) analysis item, metabolite analysis (metabolomics) item, pangenomic analysis (metagenomics) item, protein analysis (proteomics) ) item, target sequencing item, transcriptome analysis item, WGS(Whole-Genome Sequencing) analysis item, ADME(Absorption, Distribution, Metabolism, Excretion) analysis item, Basic Science Research B(Basic Science Research) B) item, biomarker development item, lead identification item, lead optimization item, protein interaction analysis item, target validation item and toxicity analysis (toxicity analysis) display the interface including items,
The system of claim 1, wherein the simulation includes a simulation of a reactivity test for a structure of the compound, a side effect test, and a prediction of activity.

The method according to claim 1,
The first data is a system including genome information of a subject who is to take a drug containing the pharmaceutical substance and genome information of a virus or bacteria causing a target disease.

The method according to claim 1,
Extracting a lesion region from the medical image data included in the medical information, securing learning data by performing data augmentation on the extracted lesion region data, and using the secured learning data to the second analysis unit A system further comprising a learning unit for learning artificial neural networks used by the

The method according to claim 1,
The compound is a substance that specifically reacts with the protein, a compound that destroys viruses or bacteria, a compound that inhibits the activity of a protein produced by the virus or the bacteria, and protein production by the virus or the bacteria A system comprising at least one of an inhibitory compound.

delete

The method according to claim 1,
The number of functions specified by the user, the accuracy requested by the user, and the calculation time required by the user, among a plurality of artificial neural networks in which the first analysis unit and the second analysis unit have different inference accuracy and computational complexity A system for selecting at least one artificial neural network to be used for analysis of the first data and the second data based on the data.

The method according to claim 1,
The design unit determines whether the compound meets a predefined criterion based on the attribute, and feeds back whether the compound meets the predefined criterion to the first analyzer and the second analyzer,
When it is fed back that the compound does not meet the predefined criteria, the first analyzer and the second analyzer use at least one other artificial neural network having higher inference accuracy than the last used at least one artificial neural network. A system that re-runs the analysis.

The method according to claim 1,
Further comprising a learning unit for performing learning of the at least one artificial neural network,
The learning unit receives a list of artificial neural networks allowing sharing for transfer learning from another system, selects at least one artificial neural network to be used for transfer learning based on the list, and selects the selected at least one artificial neural network A system for receiving information related to , and performing transfer learning on an artificial neural network of the system based on the received information.

9. The method of claim 8,
The learning unit determines a range to reuse the weight values of the selected at least one artificial neural network based on a learning amount of the selected at least one artificial neural network and the similarity of the structure.

9. The method of claim 8,
The learning unit receives information on performing additional learning of the selected at least one artificial neural network in the other system after the transfer learning,
The information on performing the additional learning is displayed to the user through an interface.

A method for analyzing genomic and medical information and developing a drug substance, the method comprising:
acquiring, from the input unit, first data representing the nucleotide sequence of the genome and second data including medical information;
In the first analysis unit, the first data is analyzed through at least one of the genome analysis methods to confirm the variation of a specific nucleotide sequence on the genome, and based on the first data, an activity analysis at the intracellular level is performed, and , predicting the structure of a disease-causing protein generated by a gene encoding a protein in the genome based on the analysis result for the first data;
In the second analysis unit, by analyzing the second data, confirming information about the symptoms occurring in the subject;
In the collecting unit, collecting data on the compound related to the structure of the protein and providing it to the design unit;
In the design unit, the structure of the compound for use as a pharmaceutical substance is determined by determining the structures of the compound capable of interacting with the structure of the protein based on the structure of the protein, the analysis result for the first data, and information on the symptoms. generating;
In the design section, absorption, distribution, metabolism, excretion, and toxicity tests for the structure of the compound and simulation for predicting the action of the structure of the compound in the cell of the subject, including pharmacokinetics of the compound analyzing the properties, and excluding the structure of the compound from a candidate material when the properties of the compound have side effects based on a result of analyzing the properties of the compound; and
outputting, in the output unit, the analysis result for the first data, the analysis result for the second data, and information related to the structure and properties of the remaining compounds not excluded from the candidate material,
Analysis of molecular level activity in the cell includes nucleic acid analysis, protein analysis, and intracellular metabolite analysis of DNA (deoxyribonucleic acid) or RNA (ribonucleic acid),
Analysis of the first data, analysis of the second data, confirmation of the structure of the protein, generation of the structure of the compound, and analysis of properties of the compound are performed using at least one artificial neural network,
In the collecting unit, the data on the protein-to-compound interaction is crawled by performing a web search and an academic database search through an external communication network, and the crawled data is processed according to the structure of the input layer of the artificial neural network. becomes,
The collection range and amount of the crawled data is determined by whether there is an experience in which a protein structure identical or similar to the structure of the compound exists, and if the learned experience exists, between the same or similar protein structure and the structure of the compound. It is determined based on the similarity,
The degree of similarity is determined based on at least one of the type, linkage structure and size of amino acids constituting the same or similar protein structure,
In the output unit, epigenomics analysis item, exome analysis item, GWAS (genome-wide association study) analysis item, metabolite analysis (metabolomics) item, pangenomic analysis (metagenomics) item, protein analysis ( proteomics item, target sequencing item, transcriptome analysis item, WGS(Whole-Genome Sequencing) analysis item, ADME(Absorption, Distribution, Metabolism, Excretion) analysis item, Basic Science B(Basic Science) Research B) item, biomarker development item, lead identification item, lead optimization item, protein interaction analysis item, target validation item, and toxicity An interface including items of toxicity analysis is displayed;
The simulation is a method comprising a simulation for the reactivity test, side effect test, activity prediction for the structure of the compound.

12. The method of claim 11,
The first data may include genome information of a subject who is to take a drug containing the pharmaceutical substance and genome information of a virus or bacteria causing a target disease.

12. The method of claim 11,
extracting, by the learning unit, a lesion region from medical image data included in the medical information;
securing, in the learning unit, learning data by performing data augmentation on the extracted data on the lesion region; and
The method further comprising the step of learning, in the learning unit, the artificial neural networks used by the second analysis unit by using the secured learning data.

12. The method of claim 11,
The compound is a substance that specifically reacts with the protein, a compound that destroys viruses or bacteria, a compound that inhibits the activity of a protein produced by the virus or the bacteria, and protein production by the virus or the bacteria A method comprising at least one of an inhibitory compound.

delete

12. The method of claim 11,
In the first analysis unit and the second analysis unit, analyzing the first data and the second data includes:
Among a plurality of artificial neural networks having different inference accuracy and computational complexity, the first data and the second data based on the number of functions specified by the user, the accuracy requested by the user, and the computation time requested by the user selecting at least one artificial neural network to use for analysis of

12. The method of claim 11,
determining, in the design unit, whether the compound meets a predefined criterion based on the attribute;
If the compound does not meet the predefined criteria, the analysis is repeated using at least one other artificial neural network having higher inference accuracy than the at least one artificial neural network last used in the first and second analyzers. The method further comprising the step of performing.

12. The method of claim 11,
receiving, in the learning unit, a list of artificial neural networks allowing sharing for transfer learning from other systems;
selecting, in the learning unit, at least one artificial neural network to be used for the transfer learning based on the list;
receiving, in the learning unit, information related to the selected at least one artificial neural network;
The method further comprising the step of performing, in the learning unit, transfer learning on the artificial neural network of the method based on the received information.

19. The method of claim 18,
In the learning unit, performing the transfer learning includes:
and determining a range to reuse the weight values of the selected at least one artificial neural network based on a learning amount of the selected at least one artificial neural network and the similarity of the structure.

19. The method of claim 18,
Further comprising the step of receiving, in the learning unit, information on performing additional learning of the selected at least one artificial neural network in the other system after the transfer learning,
The information on performing the additional learning is displayed to the user through an interface.