WO2022245062A1 - Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle - Google Patents

Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle Download PDF

Info

Publication number
WO2022245062A1
WO2022245062A1 PCT/KR2022/006903 KR2022006903W WO2022245062A1 WO 2022245062 A1 WO2022245062 A1 WO 2022245062A1 KR 2022006903 W KR2022006903 W KR 2022006903W WO 2022245062 A1 WO2022245062 A1 WO 2022245062A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
artificial neural
analysis
neural network
compound
Prior art date
Application number
PCT/KR2022/006903
Other languages
English (en)
Korean (ko)
Inventor
김원태
김동민
강신욱
이명재
Original Assignee
(주)제이엘케이
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)제이엘케이 filed Critical (주)제이엘케이
Publication of WO2022245062A1 publication Critical patent/WO2022245062A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present invention relates to genome analysis and development of medicinal substances, and relates to a method and system for analyzing genomes based on artificial intelligence and developing related medicinal substances.
  • the present invention is to provide a method and system for effective genetic analysis of diseases and development of medicinal substances.
  • the present invention is to provide a method and system for analyzing genes based on artificial intelligence.
  • the present invention is to provide a method and system for developing medicinal substances based on artificial intelligence.
  • a system for providing a genome analysis and pharmaceutical substance development platform includes an input unit for obtaining data representing a nucleotide sequence of a genome, analyzing the data, and based on the analysis result of the data, the genome An analysis unit for confirming the structure of the protein produced by, a design unit for generating a structure of a compound for use as a pharmaceutical substance based on the analysis result of the structure of the protein and the data, and analyzing the properties of the compound, and It may include an output unit that outputs analysis results for the data and information related to the structure and properties of the compound.
  • the analysis unit and the design unit may operate using at least one artificial neural network.
  • a method for genome analysis and drug substance development includes the steps of obtaining data representing the nucleotide sequence of a genome, analyzing the data, and based on the analysis result of the data, the genome Identifying the structure of a protein produced by, generating a structure of a compound for use as a pharmaceutical substance based on the structure of the protein and the analysis result of the data, analyzing the properties of the compound, and the It may include outputting an analysis result of the data and information related to the structure and properties of the compound.
  • the analysis of the data, the confirmation of the structure of the protein, the generation of the structure of the compound, and the analysis of the properties of the compound may be performed using at least one artificial neural network.
  • genes can be analyzed more effectively and medicinal substances for responding to related diseases can be more effectively developed.
  • FIG. 1 is a diagram showing a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing the structure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating an operating procedure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 4 is a diagram showing an example of a genome analysis interface provided by a genome analysis platform according to an embodiment of the present invention.
  • FIG. 5 is a diagram showing an example of a drug substance development interface provided in a drug substance development platform according to an embodiment of the present invention.
  • FIG. 6 is a diagram showing the structure of an artificial neural network applicable to a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 7 is a diagram showing a connection structure of artificial neural networks applicable to a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 8 is a diagram illustrating a procedure for performing transfer learning in a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the present invention proposes a technique for diagnosing and analyzing diseases. Furthermore, the present invention proposes a technique for designing a compound that can be used as a pharmaceutical substance to respond to the diagnosed and analyzed disease, that is, a candidate substance.
  • FIG. 1 is a diagram showing a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the genome analysis and drug substance development system includes a local device 110a, a local device 110b, and a server 120 connected to a communication network. 1 illustrates two local devices 110a and 110b,
  • the local device 110a and the local device 110b are used by a user who wants to diagnose and analyze a disease by utilizing the system.
  • the local device 110a and the local device 110b may acquire input data, transmit the input data to the server 120 through a communication network, and receive data including a result of analysis from the server 120.
  • the server 120 provides a platform for diagnosis and analysis of diseases and design of compounds to be used as medicinal substances according to embodiments of the present invention, and performs diagnosis, analysis, and design algorithms. According to various embodiments, algorithms for diagnosis, analysis, and design may be performed based on artificial intelligence.
  • the server 120 performs operations such as molecular diagnosis, genetic diagnosis, and designing a compound for a medicinal substance based on data received from at least one of the local device 110a and the local device 110b, and sends the resulting data to the local device. (110a) and to at least one of the local device (110b).
  • the server 120 may be a cloud server.
  • the local device 110a and the local device 110b are terminals and perform data input and output functions, and the server 120 performs diagnosis, analysis, and design functions.
  • the local device 110a and the local device 110b may perform at least some calculations for diagnosis, analysis, and design. The degree of distribution of calculations for diagnosis, analysis, and design may be different for each local device.
  • a local device including all functions of the server 120 may also exist. In this case, the local device 110a or 110b can perform diagnosis, analysis, and design operations even when not connected to a communication network.
  • the server 120 may provide a platform for analysis/diagnosis of diseases and development/design of medicinal substances.
  • the platform according to an embodiment forms a green zone for new drug development and can provide disease diagnosis and prediction services.
  • the platform according to an embodiment provides a bioinformatics service based on genome big data information using artificial intelligence.
  • FIG. 2 is a diagram showing the structure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the components illustrated in FIG. 2 may be included in one of a local device (local device 110a or local device 110b in FIG. 1) and a server (eg, server 120 in FIG. 1), and each component How it is deployed on the local device and server may vary according to various embodiments. Accordingly, connections between components may be based on internal circuits or external communication networks.
  • the system includes a data input unit 210, an analysis unit 220, a design unit 230, and an output unit 240.
  • the data input unit 210 is a means for inputting genome data.
  • the genomic data includes at least one of data related to a subject (eg, a patient) or data related to a virus/bacterium that causes a disease.
  • the genomic data includes information on the DNA or RNA base sequence of a subject or virus/bacteria.
  • the data input unit 210 may receive genome data through an external communication network or receive genome data through local hardware (eg, a memory port or a user input device).
  • a user who wants to use the system may input genome data through the data input unit 210 .
  • the genome data may be input in the form of a file configured according to a predefined format.
  • the analysis unit 220 analyzes the genome data input through the data input unit 210 .
  • the analysis unit 220 analyzes the genome data, confirms genetic characteristics present in the nucleotide sequence of the subject or virus/bacterium, and can estimate protein structures or intracellular activities that can be produced according to the nucleotide sequence.
  • the protein structure generating unit 226 may obtain data related to diseases caused by the subject's genes as well as diseases caused by external viruses/bacteria.
  • the analysis unit 220 includes a genome analysis unit 222, a molecular diagnosis analysis unit 224, and a protein structure generation unit 226.
  • the genome analyzer 222 may analyze genome data using various techniques.
  • the genome analysis unit 222 identifies genetic characteristics by analyzing genome data of a subject.
  • the genome analysis unit 222 may check a mutation of a specific nucleotide sequence on the subject's genome.
  • the genome analysis unit 222 is a single nucleotide polymorphism (SNP) analysis method, a single strand conformation polymorphism (SSCP) analysis method, an amplified fragment length polymorphism (AFLP) analysis method, a random amplified polymorphic DNAs (RAPD) analysis method, and an AS-PCR ( Genomic data can be analyzed using various analysis techniques such as allele-specific PCR (DASH) analysis, dynamic allele-specific hybridization (DASH) analysis, whole-genome sequencing (WGS) analysis, and next generation sequencing (NGS) analysis.
  • DASH allele-specific PCR
  • DASH dynamic allele-specific hybridization
  • WGS whole-genome sequencing
  • NGS next generation sequencing
  • the molecular diagnosis analyzer 224 performs molecular diagnosis based on genome data.
  • the molecular diagnosis analysis unit 224 identifies various molecular level activities that may occur within the subject's cells based on the genome data.
  • the molecular diagnostic analysis unit 224 detects changes at various molecular levels occurring within cells through numerical values or images.
  • the molecular diagnostic analysis unit 224 may perform nucleic acid analysis such as DNA or RNA, protein analysis, intracellular metabolome analysis, and the like.
  • the protein structure generation unit 226 predicts the structure of a protein that can be produced by the subject's gene based on the genome data. That is, the protein structure generation unit 226 predicts the structure of a disease-causing protein. If the genome data of the virus/bacterium is secured, the protein structure generation unit 226 can also predict the structure of a protein that can be produced by the gene of the virus/bacterium.
  • the analysis unit 220 includes a genome analysis unit 222, a molecular diagnosis analysis unit 224, and a protein structure generation unit 226. Analysis operations performed by the analyzer 220 are performed using genome data (eg, electronic files) rather than actual genes. Accordingly, compared to analysis using actual genes, analysis can be performed within a short time and at low cost.
  • the analyzer 220 may use an Artificial Neural Network (ANN) based on machine learning or deep learning. In this case, the analysis unit 220 may use a pre-learned artificial neural network or directly learn and use the initial artificial neural network.
  • ANN Artificial Neural Network
  • the design unit 230 designs a compound that can be used as a medicinal substance based on the genome data input through the data input unit 210 and the analysis result of the genome data by the analysis unit 220 .
  • the design unit 230 includes a characteristic compound generation unit 232 and an absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction unit 234 .
  • ADMET absorption, distribution, metabolism, excretion, and toxicity
  • the characteristic compound generating unit 232 generates a characteristic compound structure corresponding to the protein structure predicted by the protein structure generating unit 226 .
  • a specific compound is created to have a structure capable of interacting with (eg, binding to) the predicted protein, and to have properties capable of inhibiting disease by interacting with the predicted protein.
  • the ADMET prediction unit 234 performs absorption, distribution, metabolism, excretion, and toxicity tests on the compound produced by the specific compound generation unit 232. In other words, the ADMET prediction unit 234 analyzes the ADMET profile and drug action mechanism of the compound generated by the specific compound generation unit 232 . That is, the ADMET prediction unit 234 analyzes how the compound generated by the specific compound generator 232 acts in the cells of the subject based on the obtained genome data. In this case, the ADMET prediction unit 234 may build a mathematical model for the subject's characteristics based on the obtained genome data, and generate an ADMET profile based on the constructed mathematical model.
  • the design unit 230 includes a characteristic compound generation unit 232 and an ADMET prediction unit 234. Analysis operations performed by the design unit 230 are performed using compound data (eg, electronic files) rather than actual compounds.
  • the analysis unit 220 may use an artificial neural network based on machine learning or deep learning. Accordingly, compared to analysis using actual compounds, design can be made within a short period of time and at low cost. In this case, the design unit 230 may use a pre-learned artificial neural network or directly learn and use the initial artificial neural network.
  • the output unit 240 outputs results generated by the analysis unit 220 and the design unit 230 .
  • the output unit 240 may output the result as a tangible object (eg, printed matter) or in the form of a digital file.
  • the digital file may be stored in a storage unit in the system or transmitted to a pre-designated destination (eg, a designated e-mail, a designated phone number, or a designated application) through an external communication network.
  • the system may include an interface means for interaction with a user.
  • the system may include input means such as a keyboard, mouse, and touch screen, and display means such as a monitor, touch screen, and projector.
  • input means such as a keyboard, mouse, and touch screen
  • display means such as a monitor, touch screen, and projector.
  • FIG. 3 is a diagram illustrating an operating procedure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the subject performing the operations of FIG. 3 is referred to as a 'system'.
  • the system acquires genome data.
  • the obtained genomic data may include DNA or RNA nucleotide sequence data of the subject. Additionally, DNA or RNA nucleotide sequence data of a virus or fungus that causes a disease may be further included.
  • the system analyzes the genomic data.
  • the system may perform various genome analysis techniques based on genome data.
  • the system may take the genome data as an input value and analyze the genome data using an artificial neural network.
  • the system can process genome data into a form that can be input into an artificial neural network.
  • the system can estimate the genetic characteristics, molecular activity, structure of the produced protein, etc. of the subject or virus/bacterium. Due to the use of an artificial neural network, the system can obtain analysis data on genomes without actual chemical experiments.
  • the system designs a characteristic compound and analyzes properties of the designed characteristic compound. That is, based on the analysis result obtained through step 303, the system designs a compound that specifically reacts to the subject or virus/bacteria.
  • the system may include compounds that interact with disease-causing proteins produced by the subject's genes, compounds that destroy viruses/bacteria, compounds that inhibit the activity of proteins produced by viruses/bacteria, and viruses/bacteria. It is possible to design at least one of the compounds that inhibit protein production by A designed compound may be referred to as a candidate substance, candidate drug, or the like.
  • the system generates an ADMET profile for at least one designed compound. That is, the system can perform simulations to predict how the designed compound will behave within the subject's cells.
  • the system performs genome reactivity test, side effect test, and activity prediction for the candidate substance.
  • the operation of step 305 may be performed based on artificial intelligence using an artificial neural network. Due to the use of an artificial neural network, the system can acquire compounds and analytical data on compounds without actual chemical experiments.
  • the system outputs the analysis result.
  • the analysis result may be output in the form of a tangible object (eg, printed matter) or in the form of a digital file.
  • the digital file may be stored in a storage unit in the system or transmitted to a pre-designated destination (eg, a designated e-mail, a designated phone number, or a designated application) through an external communication network.
  • the above-described operations may be performed by one device (eg, the local devices 110a and 110b or the server 120), or may be performed by two or more devices.
  • an operation in which necessary information is exchanged between the two or more devices may be added between some operations.
  • step 301 is performed by a local device and step 303 is performed by a server
  • step 303 is performed by a server
  • an operation in which the local device transfers genome data to the server may be added prior to step 303 .
  • the system designs a specific compound and analyzes the properties of the designed compound.
  • the system may exclude the compound from candidate substances.
  • the system may redesign another compound or reselect another one of a plurality of previously designed compounds, and then analyze the properties. This operation may be repeated until a compound meeting the criteria is determined or until a predetermined failure condition is satisfied.
  • a user who wants to diagnose and analyze diseases and develop medicinal substances using the system can input necessary data and command analysis using a local device.
  • the local device or server provides results to the user by performing the operations described with reference to FIG. 3 .
  • the local device may provide an interface according to various embodiments for user interaction such as data input and command input. That is, the local device can display various interfaces using the display means. Examples of interfaces are described below with reference to FIGS. 4 and 5 .
  • the genome analysis interface includes a menu 410, a search bar 420, and function items 430.
  • Each of the function items 431 to 439 may be composed of an image and a name representing an analysis target or technique.
  • the functional items 430 include an epigenomics analysis item 431, an exome analysis item 432, a genome-wide association study (GWAS) analysis item 433, and a metabolite analysis ( metabolomics (434), metagenomics (435), proteomics (436), target sequencing (437), transcriptome analysis (438), At least one of whole-genome sequencing (WGS) analysis items 439 is included.
  • the user can proceed with the corresponding analysis technique by clicking or selecting the item of the function to be used for analysis.
  • next-generation sequencing such as whole genome analysis, microRNA target gene and disease association prediction analysis, mass transcript analysis, gene interaction network analysis, etc.
  • Tools for analysis and the like may be provided.
  • This platform can be understood as a convergence technology of disease diagnosis, drug substance development, and IT that enables the utilization of data that greatly increases every year according to the trend of big data and convergence and the improvement of NGS technology. That is, the present invention is based on analysis software for analyzing and interpreting vast amounts of genetic information in various fields, and can provide various researchers or companies with a platform that can be used in various industries.
  • the pharmaceutical substance development interface includes a menu 510, a search bar 520, and function items 530.
  • Each of the function items 530 may be composed of an image and a name representing an analysis target or technique.
  • the functional items 530 include an Absorption, Distribution, Metabolism, Excretion (ADME) analysis item 531, a Basic Science Research B item 532, and a Biomarker development item. (533), lead identification (534), lead optimization (535), protein interaction analysis (536), target validation (537) , toxicity analysis (toxicity analysis) item 538.
  • the present invention is to provide a big data-based platform utilizing artificial intelligence. Through this, developers will be able to increase efficiency in the process of developing new drugs.
  • the system and platform analyzes a genome and provides functions for developing a medicinal substance based on the analysis result.
  • calculation algorithms for analysis and development are performed using data other than actual genome samples.
  • calculation algorithms for analysis and development may be implemented based on artificial intelligence (AI).
  • Artificial intelligence means that machines such as computers perform thinking, learning, and analysis that are possible with human intelligence.
  • technologies that apply such artificial intelligence to the medical industry are increasing.
  • artificial neural networks are widely used. An example of an artificial neural network applicable to the present invention is shown in FIG. 6 below.
  • an artificial neural network includes an input layer 610, at least one hidden layer 620, and an output layer 630.
  • Each of the layers 610, 620, and 630 is composed of a plurality of nodes, and each node is connected to an output of at least one node belonging to the previous layer.
  • Each node adds a bias to the inner product of each output value of the nodes in the previous layer and the corresponding connection weight, and then generates a non-linear activation function
  • the output value multiplied by is delivered to at least one neuron in the next layer.
  • Artificial neural network models used in various embodiments of the present invention include a fully convolutional neural network, a convolutional neural network, a recurrent neural network, and a restricted Boltzmann machine (RBM). ) and at least one of a deep belief neural network (DBN), but is not limited thereto.
  • a deep learning-based model may be applied to extract features of an image, and a machine learning-based model may be applied when the image is classified or recognized based on the extracted features.
  • the machine learning-based model may include a Support Vector Machine (SVM), AdaBoost, and the like, but is not limited thereto.
  • an artificial neural network configured similarly to the example of FIG. 6 may be used.
  • the system according to an embodiment of the present invention may have a plurality of artificial neural networks.
  • a plurality of artificial neural networks may be classified according to functions.
  • the plurality of artificial neural networks may include at least one artificial neural network for genetic analysis and at least one artificial neural network for drug substance development.
  • at least one artificial neural network for gene analysis may be divided into at least one artificial neural network for genome analysis of a subject and at least one artificial neural network for genome analysis of viruses/bacteria.
  • the plurality of artificial neural networks may further include at least one artificial neural network for selecting an artificial neural network to be used for analysis or development.
  • FIG. 7 is a diagram showing a connection structure of artificial neural networks applicable to a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the system includes a first artificial neural network set 710 including a plurality of artificial neural networks for genome analysis and a second artificial neural network set 720 including a plurality of artificial neural networks for pharmaceutical substance development do.
  • the plurality of artificial neural networks included in the first artificial neural network set 710 may be classified according to functions (eg, an analysis method, whether an analysis target is a subject or a virus/bacteria).
  • the first artificial neural network set 710 may include artificial neural networks corresponding to each of the items 431 to 408 illustrated in FIG. 4 .
  • the plurality of artificial neural networks included in the first artificial neural network set 710 may have different structures. Specifically, the plurality of artificial neural networks may differ from each other in the number of input nodes, the shape of input values, and the like. Accordingly, the system can confirm a command by the user through the user interface and process the input genome data according to the form of input values required by the artificial neural network corresponding to the confirmed command.
  • the plurality of artificial neural networks included in the second artificial neural network set 720 may be classified according to functions (eg, an analysis method, whether an analysis target is a subject or a virus/bacteria).
  • the second artificial neural network set 720 may include artificial neural networks corresponding to each of the items 531 to 509 illustrated in FIG. 5 .
  • the plurality of artificial neural networks included in the second artificial neural network set 720 may have different structures. Specifically, the plurality of artificial neural networks may differ from each other in the number of input nodes, the shape of input values, and the like. Therefore, the system checks the user's command through the user interface, and converts the genome analysis result data input according to the type of input values required by the artificial neural network corresponding to the confirmed command and the data type of the previously performed analysis result. can be processed
  • the second artificial neural network set 720 includes artificial neural networks (hereinafter referred to as 'compound design models') for designing characteristic compounds.
  • the compound design model uses a result inferred from at least one of the artificial neural networks included in the first artificial neural network set 710 as an input.
  • the compound design model may use as an input the result of an attribute analysis (eg, an ADMET profile) of another compound (eg, a compound that was not selected as a final compound and dropped from the candidates).
  • the compound design model can use data on the interaction of proteins and compounds collected from the outside as an input.
  • the system may include a data collection unit (not shown) that performs data crawling.
  • the data collection unit collects information on compounds related to the structure of proteins obtained through genome analysis using an external data network, and provides them so that they can be used in a compound design model.
  • the data collection unit may collect information on compounds by periodically or event-based web search or academic database search. The collected information is processed and input to the input nodes of the compound design model.
  • the search range may vary based on the learning history of the protein structure obtained through genome analysis. For example, when there is an experience of learning by using the structure of the obtained protein or a structure similar thereto as learning data, it can be expected that the accuracy of inference through the compound design model will be high. Accordingly, the system may search for the obtained protein structure in past learning history data and determine the amount or range of data to be collected through crawling according to the search result. Specifically, the system determines at least one of the number of keywords for data search, the content of the keyword, and the size of data to be collected based on whether there is an experience of learning the same or similar protein structure and the degree of similarity to the protein structure with the learning experience.
  • the similarity of the protein structure may be determined based on the type, connection structure, size, etc. of amino acids constituting the protein.
  • the system uses a database (not shown) for storing information on learning data used for learning artificial neural networks (eg, artificial neural networks in the first artificial neural network set 710 and the second artificial neural network set 720). city) may be included.
  • the aforementioned data collection unit may also perform a function of collecting learning data.
  • the system performs genome analysis, molecular diagnosis, protein structure prediction, characteristic compound design, and compound property inspection based on input genome data.
  • the system can perform necessary functions while increasing the complexity of the algorithm step by step.
  • the system may use a plurality of artificial neural networks capable of performing the same function.
  • a plurality of artificial neural networks for performing protein analysis (proteomics)
  • the plurality of artificial neural networks may have differences in inference accuracy and computational complexity/time due to their different structures.
  • the system infers the operation with one artificial neural network that meets the initially set accuracy and computational complexity/time, and then uses the next artificial neural network with higher inference accuracy if the result does not converge. Inference can be performed again.
  • whether or not the result is converged may be determined based on feedback from the user (eg, evaluation input for the result) and whether or not a predefined criterion based on the determined attribute is satisfied (eg, comparison between a quantified attribute value and a threshold value). have.
  • whether a predefined criterion based on the determined attribute is satisfied may be fed back from the design unit 230 to the analysis unit 220 .
  • the system may differently set initial accuracy and computational complexity/time for each function. Also, the system can adjust the initial values of accuracy and computation time according to the number of functions the user wants to perform. Alternatively, the system may receive basic data about accuracy and calculation time from a user, determine and apply an initial value based on the input basic data.
  • the system uses an artificial neural network to perform functions necessary for analysis and development. For accurate reasoning using an artificial neural network, it is required that the artificial neural network is properly trained. To this end, the system may further include a learning unit.
  • the system may further include a learning unit.
  • it may be difficult to expect accurate inference due to the lack of learning amount. In particular, when a system and a platform are built based on a local device rather than a server, deterioration in accuracy of inference due to lack of learning may be greater.
  • Transfer learning is a learning method that applies variables (eg weight values) of another artificial neural network that has already been trained to that artificial neural network. That is, during the initial installation process of the system, transfer learning using an artificial neural network of another system according to an embodiment of the present invention may be performed according to a user's command or satisfaction of a given condition. Transfer learning may be performed through a procedure as shown in FIG. 8 below.
  • variables eg weight values
  • system A and system B are independent systems providing a platform according to embodiments of the present invention, and may be built based on a local device or a server.
  • system B registers at least one trained artificial neural network in a sharable state. That is, the user of system B registers that the learned artificial neural network is allowed to be used for transfer learning of another system (eg, system A).
  • another system eg, system A
  • information on registration or not may be stored in a separate support server.
  • system B displays a list of artificial neural networks through display means, confirms the user's selection and registration command, and then performs processing for registration (e.g., creating a shared whitelist or notifying a separate support server). can be done
  • system A sends a request for shared information.
  • system A requests information about at least one artificial neural network that can be shared.
  • system B transmits shared information.
  • the sharing information includes a list of at least one sharable artificial neural network.
  • the list may include structure information, function information, and learning amount information of at least one sharable artificial neural network.
  • system A selects at least one artificial neural network to use for transfer learning. That is, system A selects at least one artificial neural network to be used for transfer learning of an artificial neural network to be learned by system A from among artificial neural networks included in shared information provided from system B. For example, system A may select an artificial neural network of system B having the same function as the artificial neural network to be trained.
  • shared information is received only from system B, but system A receives shared information from other systems, comprehensively considers shared information provided from a plurality of systems, and selects an artificial neural network to be used for transfer learning.
  • system A sends a request for information related to the selected artificial neural network.
  • system B transmits information related to the requested artificial neural network.
  • Information related to the artificial neural network may include information necessary for transfer learning (eg, weight values).
  • system A performs transfer learning.
  • system A performs transfer learning based on information related to the artificial neural network provided from system B.
  • system A may reuse all of the weight values included in the received information as it is in its own artificial neural network or selectively reuse only some of them. Whether or not to partially reuse may be determined based on the learning amount of the artificial neural network provided from system B, similarity in structure, and the like.
  • transfer learning may be performed through interaction between different systems.
  • a separate support system supporting interaction between systems may exist.
  • each system registers information on artificial neural networks allowing sharing with the support system, and the system desiring transfer learning acquires data and information necessary for transfer learning by performing request-response signaling with the support system. can do.
  • the support system provides information on the learning performance of the artificial neural network in another system (eg, system B) after the transfer learning of the corresponding system (eg, system A), and the system that received the information (eg, system A) ) may display information on the progress of additional learning of the artificial neural network in other systems to the user through an interface.
  • the system may display information on additional learning progress in real time through a separate notification, or display information on additional learning progress when a corresponding function is commanded. Accordingly, since the user may request to perform transfer learning again, interaction between systems may be increased.
  • information on learning performance of the artificial neural network may be directly received from another system.
  • various embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • It may be implemented by a processor (general processor), controller, microcontroller, microprocessor, or the like.
  • the scope of the present invention is software or machine-executable instructions (eg, operating systems, applications, firmware, programs, etc.) that cause operations according to methods of various embodiments to be executed on a device or computer, and such software or It includes a non-transitory computer-readable medium in which instructions and the like are stored and executable on a device or computer.
  • the above information can be applied to various fields of developing pharmaceutical substances based on artificial intelligence.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention porte sur l'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle. Un système qui met en œuvre une plateforme d'analyse génomique analytique et de développement de substances pharmaceutiques peut comprendre : une unité d'entrée qui obtient des données indiquant la séquence de nucléotides d'un génome ; une unité d'analyse destinée à analyser les données et à déterminer, sur la base des résultats d'analyse des données, la structure de protéines produites par le génome ; une unité de conception qui, sur la base de la structure des protéines et des résultats d'analyse des données, génère la structure d'un composé utilisable en tant que substance pharmaceutique, et analyse les propriétés du composé ; et une unité de sortie qui sort les résultats d'analyse des données, et sort des informations se rapportant à la structure et aux propriétés du composé.
PCT/KR2022/006903 2021-05-17 2022-05-13 Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle WO2022245062A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2021-0063475 2021-05-17
KR1020210063475A KR102356257B1 (ko) 2021-05-17 2021-05-17 인공 지능 기반의 유전체 분석 및 의약 물질 개발 방법 및 시스템

Publications (1)

Publication Number Publication Date
WO2022245062A1 true WO2022245062A1 (fr) 2022-11-24

Family

ID=80266206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/006903 WO2022245062A1 (fr) 2021-05-17 2022-05-13 Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle

Country Status (2)

Country Link
KR (1) KR102356257B1 (fr)
WO (1) WO2022245062A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102356257B1 (ko) * 2021-05-17 2022-02-09 (주)제이엘케이 인공 지능 기반의 유전체 분석 및 의약 물질 개발 방법 및 시스템
CN117831640B (zh) * 2024-03-05 2024-05-14 青岛国实科技集团有限公司 基于超算的医药产业数字孪生平台

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102356257B1 (ko) * 2021-05-17 2022-02-09 (주)제이엘케이 인공 지능 기반의 유전체 분석 및 의약 물질 개발 방법 및 시스템

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102356257B1 (ko) * 2021-05-17 2022-02-09 (주)제이엘케이 인공 지능 기반의 유전체 분석 및 의약 물질 개발 방법 및 시스템

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DOOSTPARAST TORSHIZI ABOLFAZL, WANG KAI: "Next-generation sequencing in drug development: target identification and genetically stratified clinical trials", DRUG DISCOVERY TODAY, vol. 23, no. 10, 1 October 2018 (2018-10-01), AMSTERDAM, NL , pages 1776 - 1783, XP093006428, ISSN: 1359-6446, DOI: 10.1016/j.drudis.2018.05.015 *
KRISHNAN SOWMYA RAMASWAMY, BUNG NAVNEET, BULUSU GOPALAKRISHNAN, ROY ARIJIT: "Accelerating De Novo Drug Design against Novel Proteins Using Deep Learning", JOURNAL OF CHEMICAL INFORMATION AND MODELING, vol. 61, no. 2, 22 February 2021 (2021-02-22), US , pages 621 - 630, XP093006431, ISSN: 1549-9596, DOI: 10.1021/acs.jcim.0c01060 *
MOUCHLIS VARNAVAS D., AFANTITIS ANTREAS, SERRA ANGELA, FRATELLO MICHELE, PAPADIAMANTIS ANASTASIOS G., AIDINIS VASSILIS, LYNCH ISEU: "Advances in De Novo Drug Design: From Conventional to Machine Learning Methods", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 22, no. 4, 1 January 2021 (2021-01-01), pages 1 - 22, XP093006430, DOI: 10.3390/ijms22041676 *
SKALIC MIHA, SABBADIN DAVIDE, SATTAROV BORIS, SCIABOLA SIMONE, DE FABRITIIS GIANNI: "From Target to Drug: Generative Modeling for the Multimodal Structure-Based Ligand Design", MOLECULAR PHARMACEUTICS, vol. 16, no. 10, 7 October 2019 (2019-10-07), US , pages 4282 - 4291, XP093006425, ISSN: 1543-8384, DOI: 10.1021/acs.molpharmaceut.9b00634 *

Also Published As

Publication number Publication date
KR102356257B1 (ko) 2022-02-09

Similar Documents

Publication Publication Date Title
WO2022245062A1 (fr) Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle
Hufsky et al. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research
Halu et al. The multiplex network of human diseases
Llarena et al. INNUENDO: a cross‐sectoral platform for the integration of genomics in the surveillance of food‐borne pathogens
Fotis et al. Network-based technologies for early drug discovery
Jagadeesh et al. Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization
Zook et al. A robust benchmark for germline structural variant detection
Hulovatyy et al. Exploring the structure and function of temporal networks with dynamic graphlets
Moni et al. How to build personalized multi-omics comorbidity profiles
WO2022245042A1 (fr) Système de construction de base de données médicales par prétraitement de données médicales et son procédé de fonctionnement
WO2020111378A1 (fr) Procédé et système pour analyser des données de façon à aider au diagnostic d'une maladie
WO2022103134A1 (fr) Système intégré de diagnostic de maladie et procédé de fonctionnement
WO2022245063A1 (fr) Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle
Chen et al. Tissue-specific enhancer functional networks for associating distal regulatory regions to disease
WO2021149913A1 (fr) Procédé et dispositif permettant de sélectionner un gène lié à une maladie dans une analyse ngs
Chimusa et al. Post genome-wide association analysis: dissecting computational pathway/network-based approaches
Sintchenko et al. Pathogen genome bioinformatics
Zhao et al. Integration of omics and phenotypic data for precision medicine
Charkiewicz et al. The first SARS-CoV-2 genetic variants of concern (VOC) in Poland: The concept of a comprehensive approach to monitoring and surveillance of emerging variants
WO2019117400A1 (fr) Appareil et procédé de construction de réseau de gènes
Ansari et al. An approach to infer putative disease-specific mechanisms using neighboring gene networks
Chandonia et al. Lessons from the CAGI‐4 Hopkins clinical panel challenge
WO2022050624A1 (fr) Système d'analyse et d'évaluation du microbiome intestinal et procédé d'évaluation associé
McDermott et al. Defining the players in higher-order networks: predictive modeling for reverse engineering functional influence networks
WO2016208827A1 (fr) Procédé et dispositif d'analyse de gène

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22804917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE