WO2022245063A1 - Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle - Google Patents

Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle Download PDF

Info

Publication number
WO2022245063A1
WO2022245063A1 PCT/KR2022/006905 KR2022006905W WO2022245063A1 WO 2022245063 A1 WO2022245063 A1 WO 2022245063A1 KR 2022006905 W KR2022006905 W KR 2022006905W WO 2022245063 A1 WO2022245063 A1 WO 2022245063A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
analysis
artificial neural
neural network
compound
Prior art date
Application number
PCT/KR2022/006905
Other languages
English (en)
Korean (ko)
Inventor
김원태
김동민
강신욱
이명재
Original Assignee
(주)제이엘케이
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)제이엘케이 filed Critical (주)제이엘케이
Publication of WO2022245063A1 publication Critical patent/WO2022245063A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present invention relates to the analysis of genome and medical information and the development of medicinal substances, and to a method and system for analyzing genome and medical information based on artificial intelligence and developing related medicinal substances.
  • the present invention is to provide a method and system for effective analysis of genes and medical information for diseases and development of pharmaceutical substances.
  • the present invention is to provide a method and system for analyzing genes based on artificial intelligence.
  • the present invention is to provide a method and system for analyzing medical information based on artificial intelligence.
  • the present invention is to provide a method and system for developing medicinal substances based on artificial intelligence.
  • a system for providing a genome and medical information analysis and pharmaceutical substance development platform includes an input unit for acquiring first data representing a nucleotide sequence of a genome and second data including medical information; A first analyzer that analyzes data and confirms the structure of a protein produced by the genome based on the analysis result of the data, and a second analyzer that analyzes the second data to ascertain information about symptoms occurring in the subject.
  • the analysis unit and the design unit may operate using at least one artificial neural network.
  • a method for analyzing genome and medical information and developing pharmaceutical substances includes obtaining first data representing a nucleotide sequence of a genome and second data including medical information, the first data analyzing the data, and confirming the structure of the protein produced by the genome based on the analysis result of the data, confirming information about symptoms occurring in the subject by analyzing the second data, the structure of the protein, Generating the structure of a compound for use as a pharmaceutical substance based on the analysis result of the first data and the analysis result of the second data, analyzing the properties of the compound, and the analysis result of the data and outputting information related to the structure and properties of the compound.
  • the analysis of the data, the confirmation of the structure of the protein, the generation of the structure of the compound, and the analysis of the properties of the compound may be performed using at least one artificial neural network.
  • FIG. 1 is a diagram showing a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing the structure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating an operating procedure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 4 is a diagram showing an example of a genome analysis interface provided by a genome analysis platform according to an embodiment of the present invention.
  • FIG. 5 is a diagram showing an example of a drug substance development interface provided in a drug substance development platform according to an embodiment of the present invention.
  • FIG. 6 is a diagram showing the structure of an artificial neural network applicable to a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 7 is a diagram showing a connection structure of artificial neural networks applicable to a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • FIG. 8 is a diagram illustrating a procedure for performing transfer learning in a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the present invention proposes a technique for diagnosing and analyzing diseases. Furthermore, the present invention proposes a technique for designing a compound that can be used as a pharmaceutical substance to respond to the diagnosed and analyzed disease, that is, a candidate substance.
  • FIG. 1 is a diagram showing a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the genome analysis and drug substance development system includes a local device 110a, a local device 110b, and a server 120 connected to a communication network. 1 illustrates two local devices 110a and 110b,
  • the local device 110a and the local device 110b are used by a user who wants to diagnose and analyze a disease by utilizing the system.
  • the local device 110a and the local device 110b may acquire input data, transmit the input data to the server 120 through a communication network, and receive data including a result of analysis from the server 120.
  • the server 120 provides a platform for diagnosis and analysis of diseases and design of compounds to be used as medicinal substances according to embodiments of the present invention, and performs diagnosis, analysis, and design algorithms. According to various embodiments, algorithms for diagnosis, analysis, and design may be performed based on artificial intelligence.
  • the server 120 performs operations such as molecular diagnosis, genetic analysis, medical information analysis, and designing a compound for a medicinal substance based on data received from at least one of the local device 110a and the local device 110b, and the result Data is transmitted to at least one of the local device 110a and the local device 110b.
  • the server 120 may be a cloud server.
  • the local device 110a and the local device 110b are terminals and perform data input and output functions, and the server 120 performs diagnosis, analysis, and design functions.
  • the local device 110a and the local device 110b may perform at least some calculations for diagnosis, analysis, and design. The degree of distribution of calculations for diagnosis, analysis, and design may be different for each local device.
  • a local device including all functions of the server 120 may also exist. In this case, the local device 110a or 110b can perform diagnosis, analysis, and design operations even when not connected to a communication network.
  • the server 120 may provide a platform for analysis/diagnosis of diseases and development/design of medicinal substances.
  • the platform according to an embodiment forms a green zone for new drug development and can provide disease diagnosis and prediction services.
  • the platform according to an embodiment provides a bioinformatics service based on genome big data information using artificial intelligence.
  • FIG. 2 is a diagram showing the structure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the components illustrated in FIG. 2 may be included in one of a local device (local device 110a or local device 110b in FIG. 1) and a server (eg, server 120 in FIG. 1), and each component How it is deployed on the local device and server may vary according to various embodiments. Accordingly, connections between components may be based on internal circuits or external communication networks.
  • the system includes a data input unit 210, a genetic information analysis unit 220, a medical information analysis unit 230, a design unit 240, and an output unit 250.
  • the data input unit 210 is a means for inputting data to be analyzed.
  • data includes genome data and medical information data.
  • Medical information data is the result of a subject's medical examination, including diagnostic charts, images (e.g., X-ray, CT (Computed Tomography), MRI (Magnetic Resonance Imaging), PET (positron emission tomography)
  • images e.g., X-ray, CT (Computed Tomography), MRI (Magnetic Resonance Imaging), PET (positron emission tomography)
  • the genomic data includes at least one of data related to a subject (eg, a patient) or data related to a virus/bacterium that causes a disease.
  • the genomic data includes information on the DNA or RNA base sequence of a subject or virus/bacteria.
  • the data input unit 210 may receive genome data through an external communication network or receive genome data through local hardware (eg, a memory port or a user input device).
  • a user who wants to use the system may input genome data through the data input unit 210 .
  • the genome data may be input in the form of a file configured according to a predefined format.
  • the genetic information analyzer 220 analyzes genetic data input through the data input unit 210 .
  • the genetic information analysis unit 220 analyzes the genome data to determine the genetic characteristics present in the nucleotide sequence of the subject or virus/bacterium, and to estimate the protein structure or intracellular activity that can be generated according to the nucleotide sequence. have.
  • the protein structure generating unit 226 may obtain data related to diseases caused by the subject's genes as well as diseases caused by external viruses/bacteria.
  • the genetic information analysis unit 220 includes a genome analysis unit 222, a molecular diagnosis analysis unit 224, and a protein structure generation unit 226.
  • the genome analyzer 222 may analyze genome data using various techniques.
  • the genome analysis unit 222 identifies genetic characteristics by analyzing genome data of a subject.
  • the genome analysis unit 222 may check a mutation of a specific nucleotide sequence on the subject's genome.
  • the genome analysis unit 222 is a single nucleotide polymorphism (SNP) analysis method, a single strand conformation polymorphism (SSCP) analysis method, an amplified fragment length polymorphism (AFLP) analysis method, a random amplified polymorphic DNAs (RAPD) analysis method, and an AS-PCR ( Genomic data can be analyzed using various analysis techniques such as allele-specific PCR (DASH) analysis, dynamic allele-specific hybridization (DASH) analysis, whole-genome sequencing (WGS) analysis, and next generation sequencing (NGS) analysis.
  • DASH allele-specific PCR
  • DASH dynamic allele-specific hybridization
  • WGS whole-genome sequencing
  • NGS next generation sequencing
  • the molecular diagnosis analyzer 224 performs molecular diagnosis based on genome data.
  • the molecular diagnosis analysis unit 224 identifies various molecular level activities that may occur within the subject's cells based on the genome data.
  • the molecular diagnostic analysis unit 224 detects changes at various molecular levels occurring within cells through numerical values or images.
  • the molecular diagnostic analysis unit 224 may perform nucleic acid analysis such as DNA or RNA, protein analysis, intracellular metabolome analysis, and the like.
  • the protein structure generation unit 226 predicts the structure of a protein that can be produced by the subject's gene based on the genome data. That is, the protein structure generation unit 226 predicts the structure of a disease-causing protein. If the genome data of the virus/bacterium is secured, the protein structure generation unit 226 can also predict the structure of a protein that can be produced by the gene of the virus/bacteria.
  • the genetic information analysis unit 220 includes a genome analysis unit 222, a molecular diagnosis analysis unit 224, and a protein structure generation unit 226. Analysis operations performed by the genetic information analyzer 220 are performed using genome data (eg, electronic files) rather than actual genes. Accordingly, compared to analysis using actual genes, analysis can be performed within a short time and at low cost.
  • the genetic information analyzer 220 may use an Artificial Neural Network (ANN) based on machine learning or deep learning. In this case, the genetic information analyzer 220 may use a pre-learned artificial neural network or directly learn and use an initial artificial neural network.
  • ANN Artificial Neural Network
  • the medical information analyzer 230 analyzes medical information data. By analyzing the medical information data, the medical information analyzer 230 may obtain information about symptoms of a disease occurring in the subject's body. Medical information data may include image data. In this case, for effective analysis, the medical information analyzer 230 may filter the image to improve image quality or reinforce main information. For example, filtering may include at least one of fast Fourier transform (FFT), histogram equalization, motion artifact removal, or noise canceling. In addition, the medical information analysis unit 230 may extract feature points from the medical image, determine symptoms based on the extracted feature points, and convert the determined symptoms into data. That is, based on the analysis result of the medical image, the medical information analyzer 230 may determine the subject's condition, whether or not the subject is infected, the progress of the disease, and the severity of the disease.
  • FFT fast Fourier transform
  • histogram equalization e.g., histogram equalization
  • motion artifact removal e.g., motion artifact removal
  • the design unit 240 determines the genome data and medical information input through the data input unit 210, the analysis result of the genome data by the genetic information analysis unit 220, and the analysis result of the medical information by the medical information analysis unit 230. Based on the design of compounds that can be used as medicinal substances. To this end, the design unit 240 includes a characteristic compound generation unit 242 and an Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) prediction unit 244 .
  • ADMET Absorption, Distribution, Metabolism, Excretion, Toxicity
  • the characteristic compound generating unit 242 generates a characteristic compound structure corresponding to the protein structure predicted by the protein structure generating unit 226 .
  • a specific compound is created to have a structure capable of interacting with (eg, binding to) the predicted protein, and to have properties capable of inhibiting disease by interacting with the predicted protein.
  • the ADMET prediction unit 244 performs absorption, distribution, metabolism, excretion, and toxicity tests on the compound produced by the characteristic compound generation unit 242.
  • the ADMET prediction unit 244 analyzes the ADMET profile and drug action mechanism of the compound generated by the characteristic compound generation unit 242 . That is, the ADMET prediction unit 244 analyzes how the compound generated by the specific compound generator 242 acts in the cells of the subject based on the obtained genome data. In this case, the ADMET prediction unit 244 may build a mathematical model for the characteristics of the subject based on the obtained genome data, and generate an ADMET profile based on the constructed mathematical model.
  • the design unit 240 includes a characteristic compound generation unit 242 and an ADMET prediction unit 244. Analysis operations performed by the design unit 240 are performed using compound data (eg, electronic files) rather than actual compounds.
  • the genetic information analyzer 220 may use an artificial neural network based on machine learning or deep learning. In this case, as an input of the artificial neural network, an analysis result of the genome generated by the genetic information analyzer 220 and an analysis result of the medical image generated by the medical information analyzer 230 may be used. Accordingly, compared to analysis using actual compounds, design can be made within a short period of time and at low cost. In this case, the design unit 240 may use a pre-learned artificial neural network or directly learn and use the initial artificial neural network.
  • the output unit 250 outputs results generated by the genetic information analysis unit 220 , the medical information analysis unit 230 , and the design unit 240 .
  • the output unit 250 may output the result as a tangible object (eg, printed matter) or in the form of a digital file.
  • the digital file may be stored in a storage unit in the system or transmitted to a pre-designated destination (eg, a designated e-mail, a designated phone number, or a designated application) through an external communication network.
  • the system may include an interface means for interaction with a user.
  • the system may include input means such as a keyboard, mouse, and touch screen, and display means such as a monitor, touch screen, and projector.
  • input means such as a keyboard, mouse, and touch screen
  • display means such as a monitor, touch screen, and projector.
  • FIG. 3 is a diagram illustrating an operating procedure of a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the subject performing the operations of FIG. 3 is referred to as a 'system'.
  • the system obtains genome data and medical information.
  • the obtained genomic data may include DNA or RNA nucleotide sequence data of the subject. Additionally, DNA or RNA nucleotide sequence data of a virus or fungus that causes a disease may be further included.
  • the system analyzes the genomic data.
  • the system may perform various genome analysis techniques based on genome data.
  • the system may take the genome data as an input value and analyze the genome data using an artificial neural network.
  • the system can process genome data into a form that can be input into an artificial neural network.
  • the system can estimate the genetic characteristics, molecular activity, structure of the produced protein, etc. of the subject or virus/bacteria. Due to the use of an artificial neural network, the system can obtain analysis data on genomes without actual chemical experiments.
  • the system analyzes the medical information.
  • the system may analyze a medical image of a subject and identify a symptom based on the analysis result.
  • the system may filter the image, extract feature points, and determine symptoms based on the extracted feature points.
  • the system may use an artificial neural network.
  • the system may use a convolutional neural network (CNN) to identify and classify lesions.
  • CNN convolutional neural network
  • different artificial neural networks may be used according to the type of image.
  • the system designs a characteristic compound and analyzes properties of the designed characteristic compound. That is, based on the analysis result performed in step 303 and the analysis result of the medical information performed in step 305, the system designs a compound that specifically reacts to the subject or virus/bacterium.
  • the system may include compounds that interact with disease-causing proteins produced by the subject's genes, compounds that destroy viruses/bacteria, compounds that inhibit the activity of proteins produced by viruses/bacteria, and viruses/bacteria. It is possible to design at least one of the compounds that inhibit protein production by A designed compound may be referred to as a candidate substance, candidate drug, or the like. And, the system generates an ADMET profile for at least one designed compound.
  • the system can perform simulations to predict how the designed compound will behave within the subject's cells.
  • the system performs genome reactivity test, side effect test, and activity prediction for the candidate substance.
  • the operation of step 307 may be performed based on artificial intelligence using an artificial neural network. Due to the use of an artificial neural network, the system can acquire compounds and analytical data on compounds without actual chemical experiments.
  • the system outputs the analysis result.
  • the analysis result may be output in the form of a tangible object (eg, printed matter) or in the form of a digital file.
  • the digital file may be stored in a storage unit in the system or transmitted to a pre-designated destination (eg, a designated e-mail, a designated phone number, or a designated application) through an external communication network.
  • the above-described operations may be performed by one device (eg, the local devices 110a and 110b or the server 120), or may be performed by two or more devices.
  • an operation in which necessary information is exchanged between the two or more devices may be added between some operations.
  • step 301 is performed by a local device and step 303 is performed by a server
  • step 303 is performed by a server
  • an operation in which the local device transfers genome data to the server may be added prior to step 303 .
  • the system designs a specific compound and analyzes the properties of the designed compound.
  • the system may exclude the compound from candidate substances.
  • the system may redesign another compound or reselect another one of a plurality of previously designed compounds, and then analyze the properties. This operation may be repeated until a compound meeting the criteria is determined or until a predetermined failure condition is met.
  • a user who wants to diagnose and analyze diseases and develop medicinal substances using the system can input necessary data and command analysis using a local device.
  • the local device or server provides results to the user by performing the operations described with reference to FIG. 3 .
  • the local device may provide an interface according to various embodiments for user interaction such as data input and command input. That is, the local device can display various interfaces using the display unit. Examples of interfaces are described below with reference to FIGS. 4 and 5 .
  • the genome analysis interface includes a menu 410, a search bar 420, and function items 430.
  • Each of the function items 431 to 439 may be composed of an image and a name representing an analysis target or technique.
  • the functional items 430 include an epigenomics analysis item 431, an exome analysis item 432, a genome-wide association study (GWAS) analysis item 433, and a metabolite analysis ( metabolomics (434), metagenomics (435), proteomics (436), target sequencing (437), transcriptome analysis (438), At least one of whole-genome sequencing (WGS) analysis items 439 is included.
  • the user can proceed with the corresponding analysis technique by clicking or selecting the item of the function to be used for analysis.
  • next-generation sequencing such as whole genome analysis, microRNA target gene and disease association prediction analysis, mass transcript analysis, gene interaction network analysis, etc.
  • Tools for analysis and the like may be provided.
  • This platform can be understood as a convergence technology of disease diagnosis, drug substance development, and IT that enables the utilization of data that greatly increases every year according to the trend of big data and convergence and the improvement of NGS technology. That is, the present invention is based on analysis software for analyzing and interpreting vast amounts of genetic information in various fields, and can provide various researchers or companies with a platform that can be used in various industries.
  • the pharmaceutical substance development interface includes a menu 510, a search bar 520, and function items 530.
  • Each of the function items 530 may be composed of an image and a name representing an analysis target or technique.
  • the functional items 530 include an Absorption, Distribution, Metabolism, Excretion (ADME) analysis item 531, a Basic Science Research B item 532, and a Biomarker development item. (533), lead identification (534), lead optimization (535), protein interaction analysis (536), target validation (537) , toxicity analysis (toxicity analysis) item 538.
  • the present invention is to provide a big data-based platform utilizing artificial intelligence. Through this, developers will be able to increase efficiency in the process of developing new drugs.
  • the system and platform analyzes a genome and provides functions for developing a medicinal substance based on the analysis result.
  • calculation algorithms for analysis and development are performed using data other than actual genome samples.
  • calculation algorithms for analysis and development may be implemented based on artificial intelligence (AI).
  • Artificial intelligence means that machines such as computers perform thinking, learning, and analysis that are possible with human intelligence.
  • technologies that apply such artificial intelligence to the medical industry are increasing.
  • artificial neural networks are widely used. An example of an artificial neural network applicable to the present invention is shown in FIG. 6 below.
  • an artificial neural network includes an input layer 610, at least one hidden layer 620, and an output layer 630.
  • Each of the layers 610, 620, and 630 is composed of a plurality of nodes, and each node is connected to an output of at least one node belonging to the previous layer.
  • Each node adds a bias to the inner product of each output value of the nodes in the previous layer and the corresponding connection weight, and then generates a non-linear activation function
  • the output value multiplied by is delivered to at least one neuron in the next layer.
  • Artificial neural network models used in various embodiments of the present invention include a fully convolutional neural network, a convolutional neural network, a recurrent neural network, and a restricted Boltzmann machine (RBM). ) and at least one of a deep belief neural network (DBN), but is not limited thereto.
  • a deep learning-based model may be applied to extract features of an image, and a machine learning-based model may be applied when the image is classified or recognized based on the extracted features.
  • the machine learning-based model may include a Support Vector Machine (SVM), AdaBoost, and the like, but is not limited thereto.
  • an artificial neural network configured similarly to the example of FIG. 6 may be used.
  • the system according to an embodiment of the present invention may have a plurality of artificial neural networks.
  • a plurality of artificial neural networks may be classified according to functions.
  • the plurality of artificial neural networks may include at least one artificial neural network for genetic analysis and at least one artificial neural network for drug substance development.
  • at least one artificial neural network for gene analysis may be divided into at least one artificial neural network for genome analysis of a subject and at least one artificial neural network for genome analysis of viruses/bacteria.
  • the plurality of artificial neural networks may further include at least one artificial neural network for selecting an artificial neural network to be used for analysis or development.
  • FIG. 7 is a diagram showing a connection structure of artificial neural networks applicable to a genome analysis and pharmaceutical substance development system according to an embodiment of the present invention.
  • the system includes a first artificial neural network set 710 including a plurality of artificial neural networks for genome analysis, a second artificial neural network set 720 including a plurality of artificial neural networks for medical image data analysis, and A third artificial neural network set 730 including a plurality of artificial neural networks for drug substance development is included.
  • the plurality of artificial neural networks included in the first artificial neural network set 710 may be classified according to functions (eg, an analysis method, whether an analysis target is a subject or a virus/bacterium).
  • the first artificial neural network set 710 may include artificial neural networks corresponding to each of the items 431 to 408 illustrated in FIG. 4 .
  • the plurality of artificial neural networks included in the first artificial neural network set 710 may have different structures. Specifically, the plurality of artificial neural networks may differ from each other in the number of input nodes, the shape of input values, and the like. Accordingly, the system can confirm a command by the user through the user interface and process the input genome data according to the form of input values required by the artificial neural network corresponding to the confirmed command.
  • the plurality of artificial neural networks included in the second artificial neural network set 720 may be classified according to functions (eg, an analysis method, whether an analysis target is a subject or a virus/bacteria).
  • the second artificial neural network set 720 may include artificial neural networks corresponding to types of medical images (eg, X-ray, CT, MRI).
  • the plurality of artificial neural networks included in the second artificial neural network set 720 may have different structures. Specifically, the plurality of artificial neural networks may differ from each other in the number of input nodes, the shape of input values, and the like. Accordingly, the system may confirm a user's command through the user interface and process the input medical image data according to the form of input values required by the artificial neural network corresponding to the confirmed command.
  • the plurality of artificial neural networks included in the third artificial neural network set 730 may be classified according to functions (eg, analysis method, whether the analysis target is a subject or a virus/bacterium).
  • the third artificial neural network set 730 may include artificial neural networks corresponding to each of the items 531 to 509 illustrated in FIG. 5 .
  • the plurality of artificial neural networks included in the third artificial neural network set 730 may have different structures. Specifically, the plurality of artificial neural networks may differ from each other in the number of input nodes, the shape of input values, and the like. Therefore, the system checks the user's command through the user interface, and the genome analysis result data and medical data input according to the type of input values required by the artificial neural network corresponding to the confirmed command and the data type of the previously performed analysis result Image analysis result data can be processed.
  • Artificial neural networks included in the second artificial neural network set 720 may be learned using medical image data. It may be difficult to secure sufficient learning data for medical image data due to reasons such as personal information protection. Accordingly, the artificial neural networks included in the second artificial neural network set 720 may be learned using learning data secured through data argumentation.
  • data augmentation may be performed on data about lesions in medical image data. That is, the system can secure learning data by extracting a lesion area from input medical image data and performing data augmentation on the extracted data on the lesion area.
  • data augmentation may include rotation, scaling, and the like.
  • the third artificial neural network set 730 includes artificial neural networks (hereinafter referred to as 'compound design models') for designing characteristic compounds.
  • the compound design model uses a result inferred from at least one of the artificial neural networks included in the first artificial neural network set 710 as an input.
  • the compound design model may use as an input the result of an attribute analysis (eg, an ADMET profile) of another compound (eg, a compound that was not selected as a final compound and dropped from the candidates).
  • the compound design model can use data on the interaction of proteins and compounds collected from the outside as an input.
  • the system may include a data collection unit (not shown) that performs data crawling.
  • the data collection unit collects information on compounds related to the structure of proteins obtained through genome analysis using an external data network, and provides them so that they can be used in a compound design model.
  • the data collection unit may collect information on compounds by periodically or event-based web search or academic database search. The collected information is processed and input to the input nodes of the compound design model.
  • the search range may vary based on the learning history of the protein structure obtained through genome analysis. For example, when there is an experience of learning by using the structure of the obtained protein or a structure similar thereto as learning data, it can be expected that the accuracy of inference through the compound design model will be high. Accordingly, the system may search for the obtained protein structure in past learning history data and determine the amount or range of data to be collected through crawling according to the search result. Specifically, the system determines at least one of the number of keywords for data search, the content of the keyword, and the size of data to be collected based on whether there is an experience of learning the same or similar protein structure and the degree of similarity to the protein structure with the learning experience.
  • the similarity of the protein structure may be determined based on the type, connection structure, size, etc. of amino acids constituting the protein.
  • the system stores information on learning data used for learning artificial neural networks (eg, artificial neural networks in the first artificial neural network set 710 and the third artificial neural network set 730) (not shown). city) may be included.
  • the aforementioned data collection unit may also perform a function of collecting learning data.
  • the system performs genome analysis, molecular diagnosis, protein structure prediction, characteristic compound design, and compound property inspection based on input genome data.
  • the system can perform necessary functions while increasing the complexity of the algorithm step by step.
  • the system may use a plurality of artificial neural networks capable of performing the same function.
  • a plurality of artificial neural networks for performing protein analysis (proteomics)
  • the plurality of artificial neural networks may have differences in inference accuracy and computational complexity/time due to their different structures.
  • the system infers the operation with one artificial neural network that meets the initially set accuracy and computational complexity/time, and then uses the next artificial neural network with higher inference accuracy if the result does not converge. Inference can be performed again.
  • whether or not the result is converged may be determined based on feedback from the user (eg, evaluation input for the result) and whether or not a predefined criterion based on the determined attribute is satisfied (eg, comparison between a quantified attribute value and a threshold value). have.
  • whether a predefined criterion based on the determined attribute is satisfied may be fed back from the design unit 240 to the genetic information analysis unit 220 .
  • the system may differently set initial accuracy and computational complexity/time for each function. Also, the system can adjust the initial values of accuracy and computation time according to the number of functions the user wants to perform. Alternatively, the system may receive basic data about accuracy and calculation time from a user, determine and apply an initial value based on the input basic data.
  • the system uses an artificial neural network to perform functions necessary for analysis and development. For accurate reasoning using an artificial neural network, it is required that the artificial neural network is properly trained. To this end, the system may further include a learning unit.
  • the system may further include a learning unit.
  • it may be difficult to expect accurate inference due to the lack of learning amount. In particular, when a system and a platform are built based on a local device rather than a server, deterioration in accuracy of inference due to lack of learning may be greater.
  • Transfer learning is a learning method that applies variables (eg weight values) of another artificial neural network that has already been trained to that artificial neural network. That is, during the initial installation process of the system, transfer learning using an artificial neural network of another system according to an embodiment of the present invention may be performed according to a user's command or satisfaction of a given condition. Transfer learning may be performed through a procedure as shown in FIG. 8 below.
  • variables eg weight values
  • system A and system B are independent systems providing a platform according to embodiments of the present invention, and may be built based on a local device or a server.
  • system B registers at least one trained artificial neural network in a sharable state. That is, the user of system B registers that the learned artificial neural network is allowed to be used for transfer learning of another system (eg, system A).
  • another system eg, system A
  • information on registration or not may be stored in a separate support server.
  • system B displays a list of artificial neural networks through display means, confirms the user's selection and registration command, and then performs processing for registration (e.g., creating a shared whitelist or notifying a separate support server). can be done
  • system A sends a request for shared information.
  • system A requests information about at least one artificial neural network that can be shared.
  • system B transmits shared information.
  • the sharing information includes a list of at least one sharable artificial neural network.
  • the list may include structure information, function information, and learning amount information of at least one sharable artificial neural network.
  • system A selects at least one artificial neural network to use for transfer learning. That is, system A selects at least one artificial neural network to be used for transfer learning of an artificial neural network to be learned by system A from among artificial neural networks included in shared information provided from system B. For example, system A may select an artificial neural network of system B having the same function as the artificial neural network to be trained.
  • shared information is received only from system B, but system A receives shared information from other systems, comprehensively considers shared information provided from a plurality of systems, and selects an artificial neural network to be used for transfer learning.
  • system A sends a request for information related to the selected artificial neural network.
  • system B transmits information related to the requested artificial neural network.
  • Information related to the artificial neural network may include information necessary for transfer learning (eg, weight values).
  • system A performs transfer learning.
  • system A performs transfer learning based on information related to the artificial neural network provided from system B.
  • system A may reuse all of the weight values included in the received information as it is in its own artificial neural network or selectively reuse only some of them. Whether or not to partially reuse may be determined based on the learning amount of the artificial neural network provided from system B, similarity in structure, and the like.
  • transfer learning may be performed through interaction between different systems.
  • a separate support system supporting interaction between systems may exist.
  • each system registers information on artificial neural networks allowing sharing with the support system, and the system desiring transfer learning acquires data and information necessary for transfer learning by performing request-response signaling with the support system. can do.
  • the support system provides information on the learning performance of the artificial neural network in another system (eg, system B) after the transfer learning of the corresponding system (eg, system A), and the system that received the information (eg, system A) ) may display information on the progress of additional learning of the artificial neural network in other systems to the user through an interface.
  • the system may display information on additional learning progress in real time through a separate notification, or display information on additional learning progress when a corresponding function is commanded. Accordingly, since the user may request to perform transfer learning again, interaction between systems may be increased.
  • information on learning performance of the artificial neural network may be directly received from another system.
  • various embodiments of the present invention may be implemented by hardware, firmware, software, or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • It may be implemented by a processor (general processor), controller, microcontroller, microprocessor, or the like.
  • the scope of the present invention is software or machine-executable instructions (eg, operating systems, applications, firmware, programs, etc.) that cause operations according to methods of various embodiments to be executed on a device or computer, and such software or It includes a non-transitory computer-readable medium in which instructions and the like are stored and executable on a device or computer.
  • the above information can be applied to various fields of developing pharmaceutical substances based on artificial intelligence.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

La présente invention concerne l'analyse d'informations génomiques et médicales et le développement d'une substance pharmaceutique sur la base d'une intelligence artificielle, et un système pour fournir une plateforme, qui analyse un génome d'analyse et développe une substance pharmaceutique, peut comprendre : une unité d'entrée pour acquérir de premières données indiquant le séquençage d'un génome et de secondes données comprenant des informations médicales ; une première unité d'analyse pour analyser les premières données et vérifier la structure protéique générée par le génome sur la base du résultat d'analyse des données ; une seconde unité d'analyse, qui analyse les secondes données pour vérifier des informations concernant des symptômes apparaissant chez un patient ; une unité de conception qui génère, sur la base de la structure protéique, le résultat d'analyse des premières données et le résultat d'analyse des secondes données, la structure d'un composé à utiliser en tant que substance pharmaceutique, et qui analyse les attributs du composé ; et une unité de sortie pour délivrer en sortie des informations relatives au résultat d'analyse des données et de la structure et de l'attribut du composé.
PCT/KR2022/006905 2021-05-17 2022-05-13 Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle WO2022245063A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210063476A KR102363456B1 (ko) 2021-05-17 2021-05-17 인공 지능 기반의 유전체와 의료 정보 분석 및 의약 물질 개발 방법 및 시스템
KR10-2021-0063476 2021-05-17

Publications (1)

Publication Number Publication Date
WO2022245063A1 true WO2022245063A1 (fr) 2022-11-24

Family

ID=80474823

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/006905 WO2022245063A1 (fr) 2021-05-17 2022-05-13 Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle

Country Status (2)

Country Link
KR (1) KR102363456B1 (fr)
WO (1) WO2022245063A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102363456B1 (ko) * 2021-05-17 2022-02-16 (주)제이엘케이 인공 지능 기반의 유전체와 의료 정보 분석 및 의약 물질 개발 방법 및 시스템

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102363456B1 (ko) * 2021-05-17 2022-02-16 (주)제이엘케이 인공 지능 기반의 유전체와 의료 정보 분석 및 의약 물질 개발 방법 및 시스템

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102363456B1 (ko) * 2021-05-17 2022-02-16 (주)제이엘케이 인공 지능 기반의 유전체와 의료 정보 분석 및 의약 물질 개발 방법 및 시스템

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BALA SURAYYA ADO, KANT SHRI OJHA, YAKASAI ADAMU GARBA: "Deep Learning In Medical Imaging And Drug Design", JOURNAL OF HUMAN PHYSIOLOGY, vol. 2, no. 2, 31 January 2021 (2021-01-31), XP093006429, DOI: 10.30564/jhp.v2i2.2683 *
DOOSTPARAST TORSHIZI ABOLFAZL, WANG KAI: "Next-generation sequencing in drug development: target identification and genetically stratified clinical trials", DRUG DISCOVERY TODAY, ELSEVIER, AMSTERDAM, NL, vol. 23, no. 10, 1 October 2018 (2018-10-01), AMSTERDAM, NL , pages 1776 - 1783, XP093006428, ISSN: 1359-6446, DOI: 10.1016/j.drudis.2018.05.015 *
KRISHNAN SOWMYA RAMASWAMY, BUNG NAVNEET, BULUSU GOPALAKRISHNAN, ROY ARIJIT: "Accelerating De Novo Drug Design against Novel Proteins Using Deep Learning", JOURNAL OF CHEMICAL INFORMATION AND MODELING, AMERICAN CHEMICAL SOCIETY , WASHINGTON DC, US, vol. 61, no. 2, 22 February 2021 (2021-02-22), US , pages 621 - 630, XP093006431, ISSN: 1549-9596, DOI: 10.1021/acs.jcim.0c01060 *
MOUCHLIS VARNAVAS D., AFANTITIS ANTREAS, SERRA ANGELA, FRATELLO MICHELE, PAPADIAMANTIS ANASTASIOS G., AIDINIS VASSILIS, LYNCH ISEU: "Advances in De Novo Drug Design: From Conventional to Machine Learning Methods", INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, vol. 22, no. 4, 1 January 2021 (2021-01-01), pages 1676, XP093006430, DOI: 10.3390/ijms22041676 *
SKALIC MIHA, SABBADIN DAVIDE, SATTAROV BORIS, SCIABOLA SIMONE, DE FABRITIIS GIANNI: "From Target to Drug: Generative Modeling for the Multimodal Structure-Based Ligand Design", MOLECULAR PHARMACEUTICS, AMERICAN CHEMICAL SOCIETY, US, vol. 16, no. 10, 7 October 2019 (2019-10-07), US , pages 4282 - 4291, XP093006425, ISSN: 1543-8384, DOI: 10.1021/acs.molpharmaceut.9b00634 *

Also Published As

Publication number Publication date
KR102363456B1 (ko) 2022-02-16

Similar Documents

Publication Publication Date Title
WO2022245062A1 (fr) Procédé et système d'analyse génomique et de développement de substances pharmaceutiques à base d'intelligence artificielle
Singh et al. Diagnosis of COVID-19 from chest X-ray images using wavelets-based depthwise convolution network
Peng et al. COVID-19-CT-CXR: a freely accessible and weakly labeled chest X-ray and CT image collection on COVID-19 from biomedical literature
Quiroz-Juárez et al. Identification of high-risk COVID-19 patients using machine learning
Jagadeesh et al. Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization
WO2022245042A1 (fr) Système de construction de base de données médicales par prétraitement de données médicales et son procédé de fonctionnement
Malik et al. CDC_Net: Multi-classification convolutional neural network model for detection of COVID-19, pneumothorax, pneumonia, lung Cancer, and tuberculosis using chest X-rays
WO2017095014A1 (fr) Système de diagnostic d'anomalie cellulaire utilisant un apprentissage dnn, et procédé de gestion de diagnostic de celui-ci
WO2022103134A1 (fr) Système intégré de diagnostic de maladie et procédé de fonctionnement
WO2022245063A1 (fr) Méthode et système pour analyser des informations génomiques et médicales et pour développer une substance pharmaceutique sur la base d'une intelligence artificielle
Qayyum et al. Depth-wise dense neural network for automatic COVID19 infection detection and diagnosis
WO2020111378A1 (fr) Procédé et système pour analyser des données de façon à aider au diagnostic d'une maladie
Swayamsiddha et al. The prospective of artificial intelligence in COVID-19 pandemic
Roy et al. Early prediction of COVID-19 using ensemble of transfer learning
WO2022197044A1 (fr) Procédé de diagnostic de lésion de la vessie utilisant un réseau neuronal, et système associé
Sintchenko et al. Pathogen genome bioinformatics
Svahn et al. Genome-wide networks reveal emergence of epidemic strains of Salmonella Enteritidis
WO2021091348A1 (fr) Procédé et appareil pour sélectionner un nouveau candidat à un repositionnement médicamenteux
Fazle Rabbi et al. A convolutional neural network model for screening covid-19 patients based on ct scan images
Thirukrishna et al. Survey on diagnosing CORONA VIRUS from radiography chest X-ray images using convolutional neural networks
WO2019117400A1 (fr) Appareil et procédé de construction de réseau de gènes
WO2019225798A1 (fr) Procédé et dispositif de sélection d'une question dans de multiples feuilles de test psychologique sur la base d'un apprentissage automatique pour diagnostiquer rapidement les symptômes d'anxiété et de dépression
Pustokhina et al. A novel machine learning–based detection and diagnosis model for coronavirus disease (COVID-19) using discrete wavelet transform with rough neural network
WO2022050624A1 (fr) Système d'analyse et d'évaluation du microbiome intestinal et procédé d'évaluation associé
WO2016208827A1 (fr) Procédé et dispositif d'analyse de gène

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22804918

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE