CN113889181A - Medical event analysis method and device, computer equipment and storage medium - Google Patents

Medical event analysis method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113889181A
CN113889181A CN202010627570.9A CN202010627570A CN113889181A CN 113889181 A CN113889181 A CN 113889181A CN 202010627570 A CN202010627570 A CN 202010627570A CN 113889181 A CN113889181 A CN 113889181A
Authority
CN
China
Prior art keywords
data
interaction
analyzing
mathematical
medical event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010627570.9A
Other languages
Chinese (zh)
Inventor
乔楠
张雷
李萍
张晨逸
徐迟
刘登辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010627570.9A priority Critical patent/CN113889181A/en
Publication of CN113889181A publication Critical patent/CN113889181A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Bioethics (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The application discloses a method for analyzing a medical event, and belongs to the technical field of medical analysis. The method comprises the following steps: acquiring multiple groups of biological data and interaction data of the organisms, wherein the interaction data represent interaction relations among the multiple groups of biological data of the organisms or among internal components of the single omics data; and analyzing the multiple groups of the mathematical data and the interaction data to obtain an analysis result of the medical event. The method and the device improve the accuracy of analyzing the medical events.

Description

Medical event analysis method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of medical analysis technologies, and in particular, to a method and an apparatus for analyzing a medical event, a computer device, and a storage medium.
Background
In the field of medical analysis technology, the analysis of medical events based on omics data of organisms has become a current trend. For example, the use of multiomic data for drug target mining and cancer-related gene discovery has become a necessary research project for drug research companies, research institutes, hospitals, and the like.
However, the accuracy of analyzing medical events based on omics data of organisms is still low at present.
Disclosure of Invention
The application provides a method and a device for analyzing a medical event, a computer device and a storage medium, which can solve the problem that the accuracy of analyzing the medical event according to omics data of organisms is still low.
In a first aspect, the present application provides a method of analyzing a medical event. The method for analyzing the medical event comprises the following steps: acquiring multiple groups of biological data and interaction data of the organisms, wherein the interaction data represent interaction relations among the multiple groups of biological data of the organisms or among internal components of the single omics data; and analyzing the multiple groups of the mathematical data and the interaction data to obtain an analysis result of the medical event.
The analysis result of the medical event is obtained by acquiring a plurality of groups of mathematical data and interaction data of the organism and analyzing the plurality of groups of mathematical data and interaction data. Wherein the interaction data represents an interaction relationship between groups of biological data or between components within a single omics data. Therefore, the medical event analysis method provided by the embodiment of the application not only considers multiple sets of mathematical data, but also considers the interaction relationship among the multiple sets of mathematical data or among the internal components of the single omics data, so that the data for analyzing the medical event can express the characteristics of the organism per se, and the accuracy of analyzing the medical event is effectively improved.
Optionally, the multi-genomic data of the organism comprises any one or more of the following: gene mutation data, gene expression data, deoxyribonucleic acid methylation data, copy number variation data, microribonucleic acid expression data, histone modification data, gene first fusion data, chromosome isomerism data and metabolite expression data.
Optionally, the medical event comprises any one or more of the following events: analyzing the sensitivity of an organism to a drug, analyzing the susceptibility of an organism to gene interference, analyzing the type of disorder the organism suffers from, analyzing the species the organism belongs to, analyzing the causative gene of the disorder the organism suffers from, analyzing the type of organism, and analyzing the remaining life time of the organism.
In one implementation, the analysis of the multiple sets of mathematical and interaction data to obtain medical event analysis results includes: performing data fusion based on the multiple sets of mathematical data and interaction data; and analyzing the data after data fusion to obtain a medical event analysis result.
Because various complex interactions (such as the gene mutation type influencing the expression level of genes, the gene and gene activating, inhibiting or co-expressing relationship and the like) generally exist among different omics data in a living body and among internal components of single omics data (such as the gene and the gene, the metabolite and the protein), when a medical event is analyzed, data which can express the characteristics of the living body can be obtained by carrying out data fusion on a plurality of groups of chemical data and interaction data, the influence of different omics characteristics on life activities can be favorably comprehensively analyzed, the deeper life essence can be discovered, and the accuracy of analyzing the medical event can be further improved.
Moreover, when the medical event is analyzed, the used interaction data can also be interaction data corresponding to omics data used for analysis, and at the moment, the connection between the omics data and the interaction data is tighter, so that when the medical event is analyzed according to the interaction data, the accuracy of predicting the medical event can be further improved.
Optionally, performing data fusion based on multiple sets of mathematical data and interaction data, including: converting multiple sets of mathematical data into the same multidimensional data space, wherein each set of dimensional data in the multidimensional data space is used for representing one aspect of factors influencing the analysis result of the medical event; updating the plurality of sets of mathematical data converted into the multidimensional data space based on the interaction data; and performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space.
In a first implementation of transforming multiple sets of science data into the same multidimensional data space, for first group science data representing a first group science level, a value of the first dimension data representing the first group science level in the first multidimensional data space is set to a value of the first group science data, and a value of the other dimension data representing the other group science level in the first multidimensional data space is set to 0. The first group level is any one of a plurality of omics levels represented by a plurality of groups of group data.
In a second implementation manner of converting multiple sets of omic data into the same multidimensional data space, a first multidimensional data space with at least three dimensions is established, and the multiple dimensions of the first multidimensional data space respectively represent the identifier of the sample, the type of the omic data (the type of the omic level represented by the omic data) identifier, and the information of the omic data represented by the omic data in multiple dimensions. And then respectively determining numerical values carried by each omic data and used for representing information of each dimension in the plurality of dimensions, such as the identification of a sample to which the omic data belongs, the identification of an omic type of each omic data, the numerical value of an omic level represented by each omic data and the like. And then expressing the numerical values of the information for expressing the dimensions on the corresponding dimensions in the first multi-dimensional data space so as to express the multiple groups of mathematical data in the first multi-dimensional data space.
Since data in the same data space has the same data property, by representing a plurality of sets of mathematical data using data in the same data space, the plurality of sets of mathematical data can be represented using data having the same data property, and thus the problem of heterogeneity between the plurality of sets of mathematical data can be solved.
Further, the implementation process of converting multiple sets of mathematical data into the same multidimensional data space may further include: and stacking the plurality of sets of the omic data represented by the first multidimensional data space by using the different omic data as different channels. Wherein the stacking process refers to combining multiple sets of mathematical data represented using the first multidimensional data space such that the combined data is logically represented as a set of data.
In an implementation manner, a dimension can be added on the basis of the first multidimensional data space to obtain a second multidimensional data space, and the added dimension is used for representing the identifier of the omics data to obtain the omics data represented by the second multidimensional data space.
By using the second multidimensional data space to identify the omics data, the hierarchies of a plurality of groups of the omics data can be integrated from the dimensionality of the omics data, and the problem of the hierarchy of the omics data can be solved.
Optionally, the interaction data may be fused with the multiple sets of mathematical data converted into the multidimensional data space, so as to update the multiple sets of mathematical data converted into the multidimensional data space by using the interaction data. In one implementation, the interaction data may be fused with sets of mathematical data transformed into a multidimensional data space by inputting an AI model whose data is graph-structured data, such as a graph-convolutional neural network.
The multi-group study data converted into the multidimensional data space are updated through the interaction data, the interaction data and the study data can be effectively integrated, the integrated data contain data properties of different study levels, interaction relations among the multi-group study data of organisms or among internal components of single study data are contained, when medical events are analyzed according to the integrated data, more information can be provided for analysis, and the analysis accuracy can be effectively improved.
The implementation process of feature fusion of the updated multiple sets of mathematical data and the multiple sets of mathematical data converted into the multidimensional data space comprises the following steps: carrying out feature extraction on the updated multigroup mathematical data to obtain first feature data; performing feature extraction on multiple groups of chemical data converted into the multidimensional data space to obtain second feature data; and then performing feature fusion on the first feature data and the second feature data.
Optionally, a convolution operation may be performed on the updated multiple sets of mathematical data to implement feature extraction on the updated multiple sets of mathematical data. And performing a convolution operation on the plurality of sets of mathematical data converted into the multi-dimensional data space to achieve feature extraction on the plurality of sets of mathematical data converted into the multi-dimensional data space. And, when performing the convolution operation, the convolution kernel used by the convolution operation may be selected to emphasize the features required for analyzing the medical event according to the features required for analyzing the medical event. Further, when the first feature data is obtained, one or more convolution operations may be performed on the updated plurality of sets of mathematical data according to actual needs. And/or, when the second characteristic data is acquired, one or more convolution operations can be performed on the multiple sets of mathematical data converted into the multidimensional data space according to actual needs. Where the convolution operation may be performed on the data by a convolution layer in the AI model.
Optionally, before analyzing the plurality of sets of chemical data and interaction data, the method for analyzing the medical event further includes: and screening the data used for interaction based on the confidence coefficient of the interaction relation in the interaction data to obtain updated interaction data so as to simplify the calculated amount in the subsequent process.
In one implementation, the interaction data that is analyzed is represented using a contiguous matrix or graphical structure.
And/or, the interaction data comprises any one or more of the following: protein interaction networks, gene regulation networks, gene co-expression networks, and metabolic networks.
As one implementation, the analysis of the multiple sets of the mathematical data and the interaction data to obtain the analysis result of the medical event includes: and analyzing the multiple groups of mathematical data and the interaction data by adopting an AI model to obtain an analysis result of the medical event, wherein the AI model is obtained by training sample data of the multiple groups of mathematical data and sample data of the interaction data.
In a second aspect, the present application provides an apparatus for analyzing a medical event, comprising: the acquisition module is used for acquiring multiple groups of biological data and interaction data of the organisms, and the interaction data represents the interaction relation among the multiple groups of biological data of the organisms or among the internal components of the single omics data; and the analysis module is used for analyzing the multiple groups of chemical data and interaction data to obtain an analysis result of the medical event.
Optionally, the multi-genomic data of the organism comprises any one or more of the following: gene mutation data, gene expression data, deoxyribonucleic acid methylation data, copy number variation data, microribonucleic acid expression data, histone modification data, gene first fusion data, chromosome isomerism data and metabolite expression data.
Optionally, the medical event comprises any one or more of the following events: analyzing the sensitivity of an organism to a drug, analyzing the susceptibility of an organism to gene interference, analyzing the type of disorder the organism suffers from, analyzing the species the organism belongs to, analyzing the causative gene of the disorder the organism suffers from, analyzing the type of organism, and analyzing the remaining life time of the organism.
Optionally, the analysis module includes: the fusion submodule is used for carrying out data fusion based on the multigroup chemical data and the interaction data; and the analysis submodule is used for analyzing the data after the data fusion to obtain a medical event analysis result.
Optionally, the fusion submodule is specifically configured to: converting multiple sets of mathematical data into the same multidimensional data space, wherein each set of dimensional data in the multidimensional data space is used for representing one aspect of factors influencing the analysis result of the medical event; updating the plurality of sets of mathematical data converted into the multidimensional data space based on the interaction data; and performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space.
Optionally, the apparatus for analyzing a medical event further comprises: and the preprocessing submodule is used for screening the data used for interaction based on the confidence coefficient of the interaction relation in the interaction data to obtain updated interaction data.
Alternatively, the interaction data analyzed is represented using a contiguous matrix or graphical structure.
And/or, the interaction data comprises any one or more of the following: protein interaction networks, gene regulation networks, gene co-expression networks, and metabolic networks.
Optionally, the analysis module is specifically configured to: and analyzing the multiple groups of mathematical data and the interaction data by adopting an AI model to obtain an analysis result of the medical event, wherein the AI model is obtained by training sample data of the multiple groups of mathematical data and sample data of the interaction data.
In a third aspect, the present application provides a computing device comprising a processor and a memory; the memory has a computer program stored therein; the computer program, when executed by a processor, causes a computing device to perform the method of analyzing a medical event as provided in the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, which may be a non-transitory readable storage medium, and when instructions in the computer-readable storage medium are executed by a computer, the computer implements the method for analyzing a medical event provided in the first aspect. The storage medium includes, but is not limited to, volatile memory such as random access memory, and non-volatile memory such as flash memory, Hard Disk Drive (HDD), and Solid State Drive (SSD).
In a fifth aspect, the present application provides a computer program product comprising computer instructions which, when executed by a computing device, the computing device performs the method of analyzing a medical event as provided in the first aspect.
Also, the computer program product may be a software installation package, which may be downloaded and executed on a computing device in case it is desired to use the method of analysis of medical events of the first aspect.
Drawings
Fig. 1 is a schematic structural diagram of an apparatus for analyzing a medical event according to an embodiment of the present application;
FIG. 2 is a schematic deployment diagram of an apparatus for analyzing a medical event according to an embodiment of the present application;
FIG. 3 is a schematic deployment diagram of another medical event analysis device provided by an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a computing device according to an embodiment of the present application;
FIG. 5 is a flow chart of a method for analyzing a medical event provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of a pre-processing of a gene co-expression network according to an embodiment of the present disclosure;
FIG. 7 is a flowchart of a method for data fusion based on multiple sets of mathematical data and interaction data provided by an embodiment of the present application;
FIG. 8 is a schematic diagram illustrating a data fusion based on multiple sets of mathematical data and interaction data according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of an analysis module provided in an embodiment of the present application;
fig. 10 is a schematic structural diagram of another medical event analysis device provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The embodiment of the application provides a medical event analysis method, which obtains analysis results of medical events by obtaining multiple groups of mathematical data and interaction data of organisms and analyzing the multiple groups of mathematical data and interaction data. Wherein the interaction data represents an interaction relationship between groups of biological data or between components within a single omics data. For example, the interaction data may include a protein interaction network, a gene regulation network, a gene co-expression network, a metabolic network, and other biomolecule interaction networks. Therefore, the medical event analysis method provided by the embodiment of the application not only considers multiple sets of mathematical data, but also considers the interaction relationship among the multiple sets of mathematical data or among the internal components of the single omics data, so that the data for analyzing the medical event can express the characteristics of the organism per se, and the accuracy of analyzing the medical event is effectively improved.
For the sake of understanding, the following description will be made with reference to terms used in the embodiments of the present application.
Artificial Intelligence (AI), which refers to the subject of computer systems that can simulate human intelligence based on computer science and by integrating knowledge such as information theory, psychology, physiology, linguistics, logics and mathematics. At present, artificial intelligence is widely concerned by academia and industry, AI is more and more widely applied, and the AI is beyond the level of common human beings in many application fields. For example: the application of the AI technology in the field of machine vision (human recognition, image classification, object detection and the like) enables the accuracy of machine vision to be higher than that of human, and the AI technology also has good application in the fields of natural language processing, recommendation systems and the like.
Machine learning is a core means for realizing AI, a computer constructs an AI model according to the existing data aiming at the technical problem to be solved, and then utilizes the AI model to analyze the result, the method enables the computer to simulate the learning ability (such as cognitive ability, discrimination ability and classification ability) of human to solve the technical problem, so the method is called machine learning.
Deep learning (deep learning), a machine learning technique based on deep neural network algorithm, is mainly characterized in that multiple nonlinear transformations are used to process and analyze data. The method is widely applied to scenes such as image recognition, voice recognition, natural language processing, medical image data and the like.
Deep graph learning (graph deep learning) is a machine learning technique that applies various algorithms of deep learning to graph structure data. For example, graph neural networks, graph convolution neural networks, and the like are typical applications of graph deep learning.
An AI model is a mathematical model (e.g., a neural network (neural network) model) used in implementing various applications of AI by machine learning, and is essentially an algorithm including a large number of parameters and calculation formulas (or calculation rules). The AI model may employ learning of the intrinsic laws and representation hierarchies of the input data to obtain a non-linear function for the mapping relationship between the input and output, and process and analyze the new input data according to the non-linear function. AI models can be used in many application scenarios such as biology, medicine, transportation, etc., for example: when the medical event is to predict the sensitivity of the cell line to a drug, multi-set mathematical data such as gene mutation data and gene expression data of the cell line may be input to the AI model to predict the sensitivity of the cell line to the drug, etc. using the AI model. The AI models are diverse, and different AI models can be used for different application scenarios and medical events.
Multigroup chemical data (multi-omics data) refers to data indicating the omics level of gene mutation, gene expression, protein expression, epigenetic modification, and metabolic regulation of an organism. Wherein, the process of the development and maturation of the living body is influenced by the omics levels of gene mutation, gene expression, appearance modification, metabolic regulation and the like. Exemplary multigenomic data for an organism includes data generated at the multiomic level from genome, epigenome, transcriptome, proteome, metabolome, and lipidome. The interaction relationship between groups of biological data or between components within a single omics data is referred to as interaction data.
A biological molecular interaction network (biomolecular interaction network) refers to a system of multiple subunits (subunits are also called components of an organism, such as genes and proteins) associated with each other and included in an organism, and is used to represent the interaction relationship between subunits. Wherein the nodes of the biomolecule interaction network represent subunits and the edges of the biomolecule interaction network represent interaction relationships between subunits. The interaction relationship includes complex relationships such as activation and inhibition. The biomolecule interaction networks provide a mathematical representation of the linkage of subunits found in ecological, advanced chemical and physiological studies.
Protein-protein interaction network (PPI) refers to a network of biomolecular interactions that represent protein-protein interactions in cells of an organism. Wherein, the nodes of the protein interaction network represent proteins, and the edges of the protein interaction network represent the interaction between the proteins. Protein interaction networks are the most deeply analyzed networks in biology, and many protein interaction network detection methods can identify the interaction at present.
A gene regulatory network (gene regulatory network) refers to a network of interaction of biomolecules that indicates that the activity of a gene is regulated by a transcription factor bound to deoxyribonucleic acid (DNA). Wherein a part of the nodes of the gene regulatory network represent genes, a part of the nodes represent transcription factors bound to DNA, and the edges of the gene regulatory network represent the regulatory action between the genes and the transcription factors bound to DNA. Most transcription factors can bind to multiple binding sites in the genome, and therefore, all cells have a complex gene regulatory network. For example, the human genome encodes approximately 1400 transcription factors that regulate the expression of over 20000 human genes. Currently, techniques for studying gene regulatory networks include a technique of combining a chromatin immunoprecipitation technique with a ChIP method (ChIP-ChIP), a binding site analysis method (ChIP-seq), and the like.
A gene co-expression network (gene co-expression network) refers to a biomolecule interaction network that represents a co-expression relationship between genes. Wherein, the node of the gene co-expression network represents the gene, and the edge of the gene co-expression network represents that the gene and the gene have obvious co-expression relation. After gene expression profiles can be established for different samples or different experimental conditions, a gene co-expression network can be established by checking genes with similar expression patterns generated among different samples.
Metabolic networks (metabolic networks) refer to networks of biomolecular interactions that represent biochemical reactions between various chemical substances in living cells. Wherein, the nodes of the metabolic network represent chemical substances in the living cells, and the edges of the metabolic network represent biochemical reactions between various chemical substances in the living cells. Biochemical reactions may be catalyzed by enzymes for converting one chemical species to another. There is a conversion relationship between all chemicals in a cell, and therefore all chemicals in a cell are part of the metabolic network.
In the embodiment of the present application, the operations of acquiring multiple sets of mathematical data and interaction data of a biological body, analyzing the multiple sets of mathematical data and interaction data, and obtaining the analysis result of a medical event may be performed by an analysis apparatus of the medical event. Fig. 1 is a schematic structural diagram of an apparatus 10 for analyzing a medical event according to an embodiment of the present application. It should be understood that fig. 1 is only a schematic structural diagram of the analysis apparatus 10 for illustrating the medical event by way of example, and the present application does not limit the division of the modules in the analysis apparatus 10 for the medical event. As shown in fig. 1, the apparatus 10 for analyzing a medical event comprises: an acquisition module 101 and an analysis module 102.
The obtaining module 101 is configured to obtain multiple sets of mathematical data and interaction data of a biological body, where the interaction data represents an interaction relationship between the multiple sets of mathematical data of the biological body or between internal components of a single omics data. Optionally, the acquisition module 101 is further configured to pre-process the acquired sets of mathematical data and interaction data so that the interaction data and sets of mathematical data can be better used for analysis and prediction of medical events.
The analysis module 102 is configured to analyze the multiple sets of mathematical data and interaction data to obtain an analysis result of the medical event. Wherein the interaction data represents an interaction relationship between groups of biological data or between components within a single omics data.
The acquisition module 101 acquires multiple sets of mathematical data and interaction data of a biological body, and the analysis module 102 analyzes the multiple sets of mathematical data and interaction data to obtain an analysis result of a medical event, so that the analysis method of the medical event provided by the embodiment of the application not only considers the multiple sets of mathematical data, but also considers the interaction relationship between the multiple sets of mathematical data or between internal components of single omics data, so that the data for analyzing the medical event can express the characteristics of the biological body, and therefore, the accuracy of analyzing the medical event is effectively improved.
In an implementation manner, a medical event analysis model may be deployed in the analysis module 102, and the medical event analysis model is configured to analyze multiple sets of mathematical data and interaction data to obtain an analysis result of a medical event. Wherein the medical event analysis model may be implemented by an AI model. At this time, the analysis module 102 may input the received plurality of sets of the mathematical data and the interaction data to the medical event analysis model to analyze the plurality of sets of the mathematical data and the interaction data using the medical event analysis model, and output an analysis value indicating a result of analyzing the medical event from the multiomic data and the interaction data according to an analysis result of the medical event analysis model.
Also, the medical event analysis model may include a plurality of sub-models that cooperate to implement the functionality of the analysis module 102. Illustratively, the analysis module 102 is specifically configured to perform data fusion based on multiple sets of mathematical data and interaction data; and when analyzing the data after data fusion and obtaining the analysis result of the medical event, the analysis module 102 may include a fusion submodel and an analysis submodel. The fusion sub-model is used for carrying out data fusion based on the multigroup chemical data and the interaction data and outputting the fused data. The analysis submodel is used for analyzing the fused data to obtain a medical event analysis result and outputting an analysis value according to the analysis result. The fusion submodel and the analysis submodel can be realized through an AI model.
Fig. 2 is a deployment diagram of an analysis apparatus for a medical event according to an embodiment of the present application. As shown in fig. 2, the analysis device 10 of the medical event may be deployed in a cloud environment. The cloud environment is an entity which provides cloud services to users by using basic resources in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, and a cloud service platform, and the computing resources included in the cloud data center may be a large number of computing devices (e.g., servers).
Optionally, the analysis apparatus 10 for the medical event may be a server in the cloud data center for analyzing the medical event, a virtual machine created in the cloud data center for analyzing the medical event, or a software device deployed on the server or the virtual machine in the cloud data center. When the analysis apparatus 10 of the medical event is a software apparatus deployed on a server or a virtual machine in the cloud data center, the software apparatus may be deployed distributively on a plurality of servers, or distributively on a plurality of virtual machines, or distributively on a virtual machine and a server.
As shown in fig. 2, the analysis apparatus 10 for a medical event may be abstracted by a cloud service provider on a cloud service platform to a cloud service for analyzing the medical event, and after a user purchases the cloud service on the cloud service platform, the cloud environment may provide the user with the cloud service for analyzing the medical event by using the analysis apparatus 10 for the medical event. Moreover, the user may upload, on the terminal through an Application Program Interface (API) or a web interface provided by the cloud service platform, a plurality of sets of mathematical data and interaction data that affect the result of the medical event to the cloud environment, so that the analysis device 10 of the medical event may analyze the medical event according to the plurality of sets of mathematical data and interaction data. After the analysis is completed, the analysis apparatus 10 of the medical event may transmit the analysis result to a terminal used by the user, or may store the analysis result in a cloud environment, for example: and the webpage interface is presented on the cloud service platform for the user to view.
Alternatively, the analysis apparatus 10 for the medical event may be distributed by a service provider in the form of an application program, and the user may download the application program to a terminal used by the user and use the application program in the terminal.
When the analysis means 10 of the medical event is a software means, the analysis means 10 of the medical event may be logically divided into a plurality of parts, each part having a different function. For example, as shown in fig. 1, the apparatus 10 for analyzing a medical event may comprise: an acquisition module 101 and an analysis module 102. The obtaining module 101 may obtain multiple sets of mathematical data and interaction data of the biological body, and send the multiple sets of mathematical data and interaction data of the biological body to the analyzing module 102. The analysis module 102 analyzes the multiple sets of the mathematical data and the interaction data to obtain an analysis result of the medical event.
In an implementation manner, several parts of the analysis apparatus 10 for medical events may be respectively deployed in different environments or devices, and the parts deployed in different environments cooperatively implement the analysis method for medical events provided by the embodiment of the present application. The multiple portions may be deployed in any two or three of the terminal computing device, the edge environment, and the cloud environment, respectively. The terminal computing device includes: terminal server, smart mobile phone, notebook computer, panel computer, personal desktop computer and intelligent camera etc.. An edge environment is an environment that includes a collection of edge computing devices that are closer to the end computing device. The edge computing device includes: edge servers, edge kiosks that possess computational power, etc.
For example: one part of the analysis apparatus 10 of the medical event is deployed in a cloud data center (specifically, on a server or a virtual machine in the cloud data center), and the other part is deployed in an edge data center (specifically, on a server or a virtual machine in the edge data center), which is a collection of edge computing devices deployed in a short distance from a terminal. The various parts of the medical event analysis apparatus 10 deployed in different environments or facilities cooperate to perform the function of analyzing medical events from multiple sets of mathematical data. In one scenario, the apparatus 10 for analyzing a medical event may include: an acquisition module 101 and an analysis module 102. As shown in fig. 3, an acquisition module 101 is deployed in the edge data center, an analysis module 102 is deployed on the cloud data center, and after the edge data center acquires multiple sets of mathematical data and interaction data through the acquisition module 101, the multiple sets of mathematical data and interaction data may be sent to the analysis module 102 in the cloud data center, so that the analysis module 102 is used to analyze the medical event, and the analysis module 102 is used to send an analysis result to a terminal used by a user.
It should be understood that, the present application does not perform a limiting division on the deployment manner of each part in the analysis apparatus 10 for medical events, and the adaptive deployment may be performed according to the computing capability of the terminal computing device or the specific application requirement in practical application. The division method of the medical event analyzer 10 is not limited to the above-described division method, and the division method is merely an exemplary example.
In an implementation manner, when the analyzing device of the medical event is a software device, the analyzing device of the medical event can be separately deployed on one computing device in any environment. As shown in fig. 4, the computing device 20 includes a bus 201, a processor 202, a communication interface 203, and a memory 204. The processor 202, memory 204, and communication interface 203 communicate via a bus 201.
The processor 202 may be an integrated circuit chip having signal processing capability. In implementation, part or all of the functions of the analysis method for medical events provided by the embodiment of the present application may be implemented by integrated logic circuits of hardware in the processor 502 or instructions in the form of software. The processor 502 may also be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or a combination of some or all of the above. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The processor 202 may also be a general purpose processor, which may be a microprocessor or the processor may be any conventional processor or the like. For example, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), or a combination of some or all of a CPU, a GPU, and an NP. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 501, and the processor 502 reads the information in the memory 501, and completes part of the functions of the analysis method for medical events of the embodiment of the present application in combination with the hardware thereof.
Memory 204 stores computer instructions and data. For example, the memory 204 stores executable code included in the medical event analysis device, and the processor 202 reads the executable code in the memory 204 to execute the medical event analysis method provided by the embodiment of the present application. The memory 204 may be either volatile memory or nonvolatile memory, or may include a combination of both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), a flash memory (flash memory), a hard disk (HDD), or a solid-state drive (SSD). Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM). Further, other operations such as an operating system may also be included in memory 204Software modules required for the process. The operating system may be LINUXTM,UNIXTM,WINDOWSTMAnd the like.
The communication interface 203 enables communication between the computer 20 and other devices or communication networks using transceiver modules such as, but not limited to, transceivers.
Bus 201 may include a pathway to transfer information between various components of computing device 20, such as processor 202, communication interface 203, and memory 204.
The following describes an implementation process of the analysis method for medical events provided in the embodiments of the present application. As shown in fig. 5, the method for analyzing a medical event may include the steps of:
step 601, acquiring multigroup data and interaction data of the organism.
Wherein the interaction data represents an interaction relationship between groups of biological data or between components within a single omics data. For example, the interaction data of the organism may be a biomolecular interaction network. For example, the interaction data may include any one or more of the following: protein interaction networks, gene regulation networks, gene co-expression networks, and metabolic networks.
Multigenomic data refers to data representing the omic level of gene mutation, gene expression, protein expression, epigenetic modification, and metabolic regulation of an organism. For example, the multi-genomic data of an organism includes any one or more of the following: gene mutation data, gene expression data, deoxyribonucleic acid methylation data, copy number variation data, microribonucleic acid expression data, histone modification data, gene first fusion data, chromosome isomerism data and metabolite expression data. It should be noted that the above are only some examples of interaction data and multiple sets of mathematical data, and the interaction data and multiple sets of mathematical data may also be data other than the above examples in different applications, and the embodiments of the present application are not limited thereto.
The way in which the sets of mathematical and interaction data of the organism are obtained may be varied. In one implementation, a collection device may be used to collect a sample of a biological body, resulting in multiple sets of mathematical and interaction data for the biological body. For example, gene sequencing technology can be used to obtain genomics data, ChIP can be used to obtain interaction data such as gene regulatory networks. Moreover, according to different application requirements, the implementation modes of acquiring data by adopting the acquisition equipment can be at least divided into the following two types: one is to directly acquire multigroup mathematical data and interaction data. For example, the collection device may be used to collect data directly on a sample of the organism and use the collected data directly as multigroup mathematical and interaction data. The other is indirect acquisition of multiple sets of mathematical and interaction data. The acquisition process mainly comprises: after a sample of a living body is obtained, a preset analysis operation is performed on the sample of the living body, and the analysis result is used as multiple sets of mathematical data and interaction data. For example, when acquiring omics data of a cell line of an organism, the acquisition process of the omics data may comprise: the method comprises the steps of firstly collecting a sample tissue comprising a sample cell line, and then obtaining omics data of the sample cell line in the sample tissue in the modes of gene sequencing and the like.
In another implementation, the sets of mathematical and interaction data may be obtained directly from third parties. Currently, many research institutes develop and disclose sample libraries that include multiple sets of mathematical and interaction data. For example, the anticancer drug sensitivity in cancer (GDSC) database developed by the sanger institute of britain discloses genomic data, the STRING database discloses a human protein interaction network, and the cancer gene dataset (TCGA) discloses a set of cancer data set pan-cancer. Thus, the required sets of chemical and interaction data can be obtained directly from the sample library.
Step 602, preprocessing the multiple sets of mathematical data.
After the multiple sets of mathematical data are acquired, the multiple sets of mathematical data may be preprocessed to simplify the amount of computation in subsequent processes. Optionally, preprocessing such as normalization (also called normalization) can be performed on the multiple groups of chemical data to represent the multiple groups of chemical data by using values in the same data interval. For example, multiple sets of mathematical data may each be normalized to a representation in the range of [0,1 ].
Also, the plurality of sets of obtained omics data can include omics data derived from a plurality of samples, and the plurality of sets of omics data can be further filtered according to the samples in step 602 to obtain a plurality of samples each including data representing the same omics level, and the omics data obtained by the filtering can include one or more omics data. For example, the obtained plurality of sets of genomic data includes genomic data derived from 5780 samples, each set of genomic data includes multinomial genomic data, and each of 5780 samples includes genomic data representing genomic levels of 5769 gene mutations and genomic data representing transcriptomic levels of 5769 gene expressions, so that the genomic data representing genomic levels of gene mutations of 5780 samples × 5769 gene features and the genomic data representing transcriptomic levels of gene expressions of 5780 samples × 5769 gene features can be obtained by screening.
Step 603, preprocessing the interaction data.
After the interaction data is acquired, the interaction data may be preprocessed to simplify the amount of computation in subsequent processes. Optionally, since the interaction data represents an interaction relationship between multiple sets of biological data or between internal components of single omics data, the confidence of some or all of the interaction relationships in the interaction data may be obtained, and the data used for interaction may be screened based on the confidence of the some or all of the interaction relationships to obtain updated interaction data. For example, the confidence levels of all interaction relationships in the interaction data may be obtained, and the interaction relationships with confidence levels greater than a specified confidence level threshold may be screened from the interaction data, resulting in interaction data with higher confidence levels. The specified confidence threshold may be set according to actual requirements, for example, may be set to 0.9, and the embodiment of the present application does not specifically limit the threshold.
And, the interaction data can also be screened according to the object of omics level represented by the multiple groups of the chemical data, so as to obtain the interaction data related to the same object with the multiple groups of the chemical data. For example, if the multiple sets of mathematical data reflect the omics level of multiple genes, the interaction data can be screened according to the multiple genes involved in the multiple sets of mathematical data, such that the multiple genes involved in the screened interaction data correspond to the same multiple genes involved in the multiple sets of mathematical data.
When the medical event is analyzed according to the multiple sets of the mathematical data and the interaction data subsequently, the medical event needs to be analyzed according to the data of the same object in the multiple sets of the mathematical data and the interaction data, if the multiple sets of the mathematical data do not relate to the data of the object, but the interaction data comprises the data of the object, which is included in the interaction data, is useless data, therefore, the interaction data are screened according to the object, the useless data in the subsequent analysis process can be removed, and the requirements of the data redundancy and the analysis process on a memory space are reduced. For example, a plurality of sets of chemical data comprise genomic data of 5780 samples × 5769 gene features, and interaction data are screened according to the gene features related to the plurality of sets of chemical data, and data except the 5769 gene features in the interaction data can be deleted to obtain interaction data related to 5769 gene features, 260104 interaction relations between proteins.
Further, interaction data may also be data converted into a data type that can be recognized by an analysis device of the medical event for analyzing the medical event. For example, when the medical event is analyzed by using an AI model (e.g., a graph convolutional neural network) whose input data is graph structure data, the interaction data may be converted into a data type that can be recognized by the AI model whose input data is graph structure data, such as an adjacency matrix or a graph structure. In one implementation, interaction data can be converted into a adjacency matrix or graph structure using graph structure data-dependent software (e.g., NetworkX software).
It should be noted that, when the analysis method of the medical event is executed by the analysis apparatus of the medical event, and the analysis apparatus of the medical event includes the obtaining module and the analysis module, the above-mentioned steps 601, 602 and 603 may be executed by the obtaining module.
And step 604, analyzing the multiple groups of the mathematical data and the interaction data to obtain an analysis result of the medical event.
Optionally, the medical event comprises any one or more of the following events: analyzing the sensitivity of an organism to a drug, analyzing the susceptibility of an organism to gene interference, analyzing the type of disorder the organism suffers from, analyzing the species the organism belongs to, analyzing the causative gene of the disorder the organism suffers from, analyzing the type of organism, and analyzing the remaining life time of the organism. It should be noted that the above is only an example of the medical event, and the medical event may also be data other than the above example in different applications, and the embodiment of the present application does not specifically limit the data.
In one implementation, the implementation of step 604 may include: and performing data fusion based on the multiple groups of mathematical data and the interaction data, and analyzing the data after the data fusion to obtain an analysis result of the medical event.
The data obtained by data fusion of the multiple sets of the mathematical data and the interaction data can be represented by a matrix with a plurality of columns. Optionally, before analyzing the data after data fusion, the data after data fusion may be subjected to full connection processing, so as to analyze the data after full connection processing. Wherein, carrying out full-connection processing on the data after data fusion comprises the following steps: and performing matrix splicing on data represented by a matrix with a plurality of columns according to the columns to obtain a column matrix with one column.
Analyzing the data after data fusion to obtain an analysis result of the medical event may include: and extracting features from the data after data fusion, and then performing feature analysis on the extracted features to obtain an analysis result of the medical event.
Because various complex interactions (such as the gene mutation type influencing the expression level of genes, the gene and gene activating, inhibiting or co-expressing relationship and the like) generally exist among different omics data in a living body and among internal components of single omics data (such as the gene and the gene, the metabolite and the protein), when a medical event is analyzed, data which can express the characteristics of the living body can be obtained by carrying out data fusion on a plurality of groups of chemical data and interaction data, the influence of different omics characteristics on life activities can be favorably comprehensively analyzed, the deeper life essence can be discovered, and the accuracy of analyzing the medical event can be further improved.
Moreover, when the medical event is analyzed, the used interaction data can also be interaction data corresponding to omics data used for analysis, and at the moment, the connection between the omics data and the interaction data is tighter, so that when the medical event is analyzed according to the interaction data, the accuracy of predicting the medical event can be further improved.
It should be noted that the implementation manner of step 604 includes: and performing data fusion based on the multiple groups of mathematical data and the interaction data, and analyzing the data after the data fusion to obtain an analysis result of the medical event. The medical event analysis model may include a fusion submodel and an analysis submodel. The fusion sub-model is used for carrying out data fusion based on the multigroup chemical data and the interaction data and outputting the fused data. The analysis submodel is used for analyzing the fused data to obtain a medical event analysis result and outputting an analysis value according to the analysis result.
The analysis method of the medical event provided by the embodiment of the application can be applied to various scenes, and the various scenes need to meet the following requirements: in analyzing medical events, the analysis is performed using the omics data and the interaction data. Wherein each omic data in the plurality of sets of chemical data represents an aspect of factors that affect the outcome of the medical event, i.e., each omic data in the plurality of sets of chemical data contributes to the analysis of the medical event. And, the interaction data represents an interaction relationship between groups of biological data or between components within a single omics data. For example, the analysis method of the medical event provided by the embodiment of the application can be applied to: according to the multiple groups of biological data and interaction data, the application scene of the medical event, the gene interference sensitivity of the biological body, the application scene of the disease type of the biological body, the application scene of the species of the biological body, the application scene of the disease causing gene of the disease of the biological body, the application scene of the type of the biological body and the application scene of the residual survival time of the biological body are analyzed.
It should be noted that, when the application scenarios of the analysis method of the medical event provided in the embodiment of the present application are different, the types of multiple sets of mathematical data used for analysis, the types of interaction data, the content referred by the medical event, and the meaning of the analysis are different. The following application scenarios are described as examples.
When the analysis method of the medical event provided by the embodiment of the application is used for analyzing the sensitivity degree of the organism to the drug, the types of the multigroup chemical data comprise: the type of the interaction data is a biomolecule interaction network such as a gene co-expression network of the organism, and the medical event refers to the sensitivity degree of the organism to drugs. Wherein the organism may be a cancer cell line, a cell line in an animal tissue, a cell line of a patient suffering from a disease, or an organism to which a cell line of a xenograft model animal belongs, and the like. Wherein, the sensitivity of the organism to the drug can be expressed by half inhibitory concentration (IC 50) of cell line in the organism, the half inhibitory concentration refers to: when a drug is administered to a cell line, the ratio of the total number of apoptotic cells in the cell line to the total number of total cells comprised by the cell line is equal to the corresponding drug concentration at 50%.
By analyzing the sensitivity of the organism to the medicine, the medicine can be pertinently taken to the organism according to the analysis result in the process of taking the medicine to the organism, the treatment mode of the organism is determined, the individual accurate medical treatment to the organism is realized, and the treatment effect is improved.
When the analysis method of medical events provided in the embodiments of the present application is used for analyzing the gene interference sensitivity of an organism, the types of multigroup mathematical data include: the type of the interaction data is a biomolecule interaction network such as a gene co-expression network of the organism, and the medical event refers to the gene interference sensitivity degree of the organism. Wherein the susceptibility to gene interference of the organism is indicative of the extent to which a particular gene in the sample organism is knocked out of its influence on the occurrence of death in the sample organism.
By analyzing the gene interference sensitivity of organisms, the prediction of the anti-cancer target gene can be realized, so that the anti-cancer target gene is subjected to a reverse genetics means in the cancer treatment process, and the effective treatment of the cancer is realized. Wherein, the prediction of the anti-cancer target gene refers to: determining the probability of cancer cell death by supposing to knock out a gene to determine the influence degree of the gene on cancer cell death, and determining one or more genes having a larger influence on the death degree of the cancer cell as the target anticancer genes of the cancer cell.
When the analysis method of the medical event provided by the embodiment of the application is used for analyzing the type of the disease suffered by the organism, the types of the multiple groups of mathematical data comprise: the type of the interaction data is a biomolecule interaction network such as a gene co-expression network of the organism, and the medical event refers to the type of a disease condition suffered by the organism.
Due to differences in factors such as the genetic background of the patients, patients classified as having the same type of disease based on pathology also often have different responses to the same drug treatment or the same immunotherapy, and patients identified as having the same type of cancer based on pathology also have problems with widely varying survival rates. It can thus be seen that the patient cannot be accurately classified on the basis of pathology alone. Thus, the type of disease that the patient is suffering from (i.e., the patient is typed) can be analyzed for different biomedical problems based on multiple sets of mathematical and interaction data for the patient, in combination with clinical data for the patient. The patient is classified according to the multiple groups of mathematical data and interaction data, so that the accuracy of patient classification can be improved, the targeted treatment can be conveniently performed on different types of patients, and the cure rate of the patients is improved.
When the analysis method of the medical event provided by the embodiment of the application is used for analyzing the category to which the organism belongs, the types of the multigroup mathematical data include: the type of the interaction data is a biomolecule interaction network such as a gene co-expression network of the organism, and the medical event refers to the kind to which the organism belongs.
Since some organisms have a large variety and it is difficult to distinguish the category to which the organism belongs from the dominant features of the organism, when the medical event analysis method provided in the embodiments of the present application is used to classify the organisms, the accuracy of classification can be ensured because the organisms can be classified according to recessive and genetic characteristic features included in omics data and interaction data. For example, the analysis method of the medical event provided by the embodiment of the present application may be used for analyzing the types of single cells, and since the single cells are various and often mixed together to be difficult to analyze, the method provided by the embodiment of the present application may be used to classify the single cells more accurately by using multiple sets of mathematical data and interaction data in the single cells. For another example, in implementing the method for analyzing a medical event provided in the embodiment of the present application, a xenograft (PDX) mouse model of a tumor tissue of a patient may be used to analyze multiple sets of mathematical data and interaction data of a PDX mouse, so that the PDX mouse can be used as an experimental sample to verify an analysis result, thereby more comprehensively and systematically analyzing the medical event.
Taking an application scenario that the analysis method for medical events provided in the embodiment of the present application is used for analyzing the sensitivity of cancer cell lines to drugs as an example, an implementation process of the analysis method for medical events provided in the embodiment of the present application is illustrated below.
In step S11, gene expression data, gene mutation data, and a gene co-expression network of the cancer cell line are acquired.
Among them, gene expression data is omics data indicating the transcriptomic level of gene expression. Gene mutation data is omics data representing the genomics level of gene mutations. The gene expression data and the gene mutation data are omics data for reflecting the genetic characteristics of the cancer cell lines, namely, the gene expression data and the gene mutation data both represent one factor influencing the sensitivity degree of the cancer cell lines to drugs. A gene co-expression network represents a network of biomolecular interactions between genes in a co-expression relationship. Moreover, the nodes of the gene co-expression network represent genes, and the edges of the gene co-expression network represent that an obvious co-expression relationship exists between the genes.
Omics data comprising data derived from 5780 cancer cell lines can be obtained from a cancer data set pan-cancer profile, the omics data for each cancer cell line comprising multinomial data, and the omics data for 5780 cancer cell lines each comprising gene mutation data representing 5769 gene signatures and gene expression data representing 5769 gene signatures.
In step S12, the gene expression data and the gene mutation data of the cancer cell line are preprocessed.
Since the omics data of 5780 cancer cell lines each include gene expression data representing the characteristics of 5769 genes and gene mutation data representing the characteristics of 5769 genes, i.e., the gene expression data and the gene mutation data are the omics data included in all of 5780 cancer cell lines. Therefore, the screening can obtain the gene mutation data of 5780 samples × 5769 gene characteristics and the gene expression data of 5780 samples × 5769 gene characteristics.
In step S13, the gene co-expression network of the cancer cell line was pre-processed.
FIG. 6 is a schematic diagram illustrating a principle of pre-processing a gene co-expression network according to an embodiment of the present disclosure. The left gene co-expression network in fig. 6 is a gene co-expression network without pre-processing, and the numbers of the sides of the gene co-expression network indicate the confidence of the co-expression relationship of the genes expressed by the sides. Assuming that the confidence threshold that needs to be satisfied for analyzing the co-expression relationship in the gene co-expression network is 0.9, the co-expression relationship in the gene co-expression network can be screened according to the confidence threshold, and all co-expression relationships with confidence lower than 0.9 in the co-expression relationship in the gene co-expression network are deleted, so as to obtain the updated gene co-expression network shown on the right side in fig. 6.
After screening the co-expression relationship in the gene co-expression network, the updated gene co-expression network can be converted into an adjacent matrix by using NetworkX software so as to be convenient for subsequent use.
In step S14, the gene expression data and gene mutation data of the pretreated cancer cell lines and the pretreated gene co-expression network are analyzed to obtain the sensitivity of the cancer cell lines to drugs.
The data fusion can be performed on the gene expression data and the gene mutation data of the cancer cell lines subjected to the pretreatment and the gene co-expression network subjected to the pretreatment, so as to obtain fused data expressed by a matrix. And then carrying out matrix splicing on the matrix used for expressing the fused data according to columns to obtain a column matrix with one column. And analyzing and predicting the column matrix to obtain the sensitivity of the cancer cell line to the medicine.
The implementation of data fusion based on multiple sets of mathematical and interaction data in step 604 is described below. There are various ways to implement data fusion of multiple sets of mathematical data and interaction data, and one implementation way is taken as an example for explanation below. As shown in fig. 7, data fusion based on multiple sets of mathematical and interaction data may include the steps of:
step 6041, transform the multiple sets of mathematical data into the same multidimensional data space.
The multiple groups of biological data of the organisms comprise data generated at a plurality of omics levels, such as genome, epigenome, transcriptome, proteome, metabolome, lipidome and the like, and different omics data have different biological meanings and can reflect the regular characteristics of life at different stages from different levels. In addition, the data properties and the distribution rules of different types of omics data are different, so that multiple groups of omics data not only have strong heterogeneity, but also have obvious hierarchy. For example, the data for gene expression is a continuous number, the second generation sequencing data follows a poisson distribution, and the gene microarray (micro-array) data follows a normal distribution. The gene mutation data are 0/1 values, subject to a binomial distribution. Therefore, multiple sets of mathematical data can be transformed into the same multidimensional data space to solve the problem of heterogeneity and hierarchy of the multiple sets of mathematical data. Wherein each dimension of data in the multi-dimensional data space is used to represent an aspect of a factor affecting an analysis result of the medical event. Heterogeneity refers to the representation of data in different types of data spaces with different omics data. Hierarchy refers to different molecular stages in data reflecting life activities in different omics data.
The implementation process of converting multiple groups of chemical data into the same multidimensional data space comprises the following steps: the multiple sets of mathematical data are respectively represented using data in a first multidimensional data space. When each dimension of the first multi-bit data space represents a different meaning, the specific implementation manner for converting multiple sets of mathematical data into the same multi-dimensional data space is also different, and the two implementation manners are described as an example below.
In a first implementation, for first group data representing a first group level, a numerical value of first dimension data representing the first group level in a first multidimensional data space is set to a numerical value of the first group data, and a numerical value of other dimension data representing other group levels in the first multidimensional data space is set to 0. The first group level is any one of a plurality of omics levels represented by a plurality of groups of group data.
In an implementation manner, when data fusion is performed based on multiple sets of chemical data and interaction data by using a fusion sub-model, the fusion sub-model may include an embedding (embedding) layer, and the implementation process of the first implementation manner may be implemented by using the embedding layer.
By representing a plurality of sets of mathematical data using data in the same data space, a plurality of sets of mathematical data can be represented using data having the same standard, and thus the problem of heterogeneity between a plurality of sets of mathematical data can be solved.
TABLE 1
Omics data 1 Omics data 2 Omics data 3
Gene 1 1 0.1214 0.1514
Gene 2 0 0.1477 0.1177
For example, assume that the multi-set mathematical data includes: omics data 1 indicating the genomics level of gene mutation, omics data 2 indicating the transcriptomics level of gene expression, and omics data 3 indicating the omics level of protein expression. As shown in table 1, it is assumed that omics data 1 of gene 1 is 1, omics data 1 of gene 2 is 0, omics data 2 of gene 1 is 0.1214, omics data 2 of gene 2 is 0.1477, omics data 3 of gene 1 is 0.1514, and omics data 3 of gene 2 is 0.1177. The first multidimensional data space representing gene expression, gene mutation and protein expression is a three-dimensional data space having three dimensions representing a first dimension of genomics level of gene mutation affecting analysis results of the medical event, a second dimension of transcriptomic level of gene expression affecting analysis results of the medical event, and a third dimension of omic level of protein expression affecting analysis results of the medical event, respectively. After transforming omics data 1, omics data 2 and omics data 3 into the three-dimensional data space, omics data 1 of transformed gene 1 may be represented as (1,0,0), omics data 1 of transformed gene 2 may be represented as (0,0,0), omics data 2 of transformed gene 1 may be represented as (0,0.1214,0), omics data 2 of transformed gene 2 may be represented as (0,0.1477,0), omics data 3 of transformed gene 1 may be represented as (0,0,0.1514), and omics data 3 of transformed gene 2 may be represented as (0,0, 0.1177).
In a second implementation manner, a first multidimensional data space with at least three dimensions is established, and the multiple dimensions of the first multidimensional data space respectively represent the identifier of the sample, the type of the omics data (the type of the omics level represented by the omics data) and the information of the omics data represented by the omics data in multiple dimensions. And then respectively determining numerical values carried by each omic data and used for representing information of each dimension in the plurality of dimensions, such as the identification of a sample to which the omic data belongs, the identification of an omic type of each omic data, the numerical value of an omic level represented by each omic data and the like. And then expressing the numerical values of the information for expressing the dimensions on the corresponding dimensions in the first multi-dimensional data space so as to express the multiple groups of mathematical data in the first multi-dimensional data space.
In an implementation manner, when the fusion sub-model is used for data fusion based on multiple sets of mathematical data and interaction data, the fusion sub-model may include an embedding layer, and the implementation process of the second implementation manner may be implemented by using the embedding layer. Data in the same data space has the same data property, and by representing a plurality of sets of mathematical data using data in the same data space, the plurality of sets of mathematical data can be represented using data having the same data property, and thus the problem of heterogeneity between the plurality of sets of mathematical data can be solved.
For example, assume that the multi-set mathematical data includes: omics data 1, omics data 2 and omics data 3. Omics data 1 is from sample 1, i.e., the sample is identified as 1, omics data 1 represents a genomics level of the genetic mutation of 1, and the type of the genetic mutation is identified as 1, then the omics data 1 can be represented using a first multidimensional data space having three dimensions. The three dimensions respectively represent the identifier of the sample, the type identifier of the omics data and the height of the omics level represented by the omics data, and when the three-dimensional data space represents the omics data 1, the representation value of the omics data 1 in the three-dimensional data space is (1, 1, 1).
Omics data 2 is from sample 1, i.e., sample is identified as 1, omics data 2 represents transcriptomic level of gene expression as 0.1214, and type of gene expression is identified as 2, then the omics data 2 can be represented using a first multidimensional data space having three dimensions. The three dimensions respectively represent the identifier of the sample, the type identifier of the omics data and the height of the omics level represented by the omics data, and when the three-dimensional data space represents the omics data 2, the representation value of the omics data 2 in the three-dimensional data space is (1, 2, 0.1214).
Omics data 3 from sample 2, i.e., sample identified as 2, omics data 3 representing an omics level of protein expression of 0.1514, and the type of genetic mutation identified as 3, then the omics data 3 can be represented using a first multidimensional data space having three dimensions. Three dimensions respectively represent the sample identifier, the omic data type identifier and the omic level represented by the omic data, and when the three-dimensional data space represents the omic data 1, the representation value of the omic data 3 in the three-dimensional data space is (2, 3, 0.1514).
Accordingly, referring to fig. 8, the omics data 1, the omics data 2 and the omics data 3 are respectively one-dimensional data, and after the omics data 1, the omics data 2 and the omics data 3 are transformed into a three-dimensional data space, each transformed omics data can be respectively represented by a three-dimensional space (for example, three cuboids filled with points with different densities in fig. 8), and three dimensions of the three-dimensional space respectively represent the identifier of the sample, the type identifier of the omics data and the omics level represented by the omics data.
Further, the implementation process of converting multiple sets of mathematical data into the same multidimensional data space may further include: and stacking the plurality of sets of the omic data represented by the first multidimensional data space by using the different omic data as different channels. Wherein the stacking process refers to combining multiple sets of mathematical data represented using the first multidimensional data space such that the combined data is logically represented as a set of data. In an implementation manner, a dimension can be added on the basis of the first multidimensional data space to obtain a second multidimensional data space, and the added dimension is used for representing the identifier of the omics data to obtain the omics data represented by the second multidimensional data space. By using the second multidimensional data space to identify the omics data, the hierarchies of a plurality of groups of the omics data can be integrated from the dimensionality of the omics data, and the problem of the hierarchy of the omics data can be solved.
Continuing with the example in the second implementation of transforming multiple sets of omic data into the same multidimensional data space, the representative value of omic data 1 in the three-dimensional data space is (1, 1, 1), the representative value of omic data 2 in the three-dimensional data space is (1, 2, 0.1214), and the representative value of omic data 3 in the three-dimensional data space is (2, 3, 0.1514). After a dimension representing the identifier of the omics data is newly added on the basis of the three-dimensional data space, a four-dimensional data space is obtained, the identifier of the omics data 1 is 1, the identifier of the omics data 2 is 2, and the identifier of the omics data 3 is 3, so that the representation value of the omics data 1 in the four-dimensional data space is (1, 1, 1, 1), the representation value of the omics data 2 in the four-dimensional data space is (1, 2, 0.1214,2), and the representation value of the omics data 3 in the four-dimensional data space is (2, 3, 0.1514, 3).
With continued reference to fig. 8, after stacking the multiple sets of mathematical data represented by the first multidimensional data space, a fourth dimension is added to the original three-dimensional space, so as to obtain a four-dimensional space. Wherein the fourth dimension represents an identification of omics data. Correspondingly, three cuboids respectively used for expressing omics data 1, omics data 2 and omics data 3 are sequentially arranged along the fourth dimension in the four-dimensional space, and a four-dimensional space graph obtained by stacking the three cuboids is obtained. In fig. 8, the graph filled with the patterns having the same gradation density represents data obtained from the same omics data. That is, the cuboid filled with the minimum density dots and the cuboid filled with the minimum density diagonal lines in fig. 8 both represent data obtained from omics data 1, the cuboid filled with the medium density dots and the cuboid filled with the medium density diagonal lines in fig. 8 both represent data obtained from omics data 2, and the cuboid filled with the maximum density dots and the cuboid filled with the maximum density diagonal lines in fig. 8 both represent data obtained from omics data 3.
The sets of chemistry data converted into the multidimensional data space are updated based on the interaction data, step 6042.
Optionally, the interaction data may be fused with the multiple sets of mathematical data converted into the multidimensional data space, so as to update the multiple sets of mathematical data converted into the multidimensional data space by using the interaction data. In one implementation, the interaction data may be fused with sets of mathematical data transformed into a multidimensional data space by inputting an AI model whose data is graph-structured data, such as a graph-convolutional neural network. It should be understood that there are other implementations of fusing interaction data with sets of mathematical data transformed into a multidimensional data space, and the use of the AI model is only an example and not intended to limit the implementation thereof.
Before the interaction data is fused with the plurality of sets of mathematical data converted into the multidimensional data space, Laplacian normalization (Laplacian normalization) may be performed on the interaction data to convert the interaction data into data having the same data structure as the plurality of sets of mathematical data converted into the multidimensional data space.
With continued reference to fig. 8, the plurality of cuboids filled with dots of different densities represent the plurality of sets of mathematical data converted into the multidimensional data space, and the plurality of cuboids filled with slashes of different densities represent data updated using the interaction data for the plurality of sets of mathematical data converted into the multidimensional data space.
The multi-group study data converted into the multidimensional data space are updated through the interaction data, the interaction data and the study data can be effectively integrated, the integrated data contain data properties of different study levels, interaction relations among the multi-group study data of organisms or among internal components of single study data are contained, when medical events are analyzed according to the integrated data, more information can be provided for analysis, and the analysis accuracy can be effectively improved.
And 6043, performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space.
The implementation process of feature fusion of the updated multiple sets of mathematical data and the multiple sets of mathematical data converted into the multidimensional data space comprises the following steps: carrying out feature extraction on the updated multigroup mathematical data to obtain first feature data; performing feature extraction on multiple groups of chemical data converted into the multidimensional data space to obtain second feature data; and then performing feature fusion on the first feature data and the second feature data.
Optionally, a convolution operation may be performed on the updated multiple sets of mathematical data to implement feature extraction on the updated multiple sets of mathematical data. And performing a convolution operation on the plurality of sets of mathematical data converted into the multi-dimensional data space to achieve feature extraction on the plurality of sets of mathematical data converted into the multi-dimensional data space. And, when performing the convolution operation, the convolution kernel used by the convolution operation may be selected to emphasize the features required for analyzing the medical event according to the features required for analyzing the medical event. Further, when the first feature data is obtained, one or more convolution operations may be performed on the updated plurality of sets of mathematical data according to actual needs. And/or, when the second characteristic data is acquired, one or more convolution operations can be performed on the multiple sets of mathematical data converted into the multidimensional data space according to actual needs. Wherein, when data fusion is performed based on the plurality of sets of mathematical data and interaction data using the fusion submodel, the fusion submodel may include a convolution layer through which a convolution operation may be performed on the data.
It should be understood that, the feature extraction of the updated multiple sets of mathematical data and the feature extraction of the multiple sets of mathematical data converted into the multidimensional data space may be implemented by other implementations besides the convolution operation, and the embodiment of the present application is not particularly limited thereto.
Since the first feature data and the second feature data also carry information of omics data, the implementation manner of performing feature fusion on the first feature data and the second feature data may include: and respectively determining data carrying the same omics data in the first characteristic data and the data in the second characteristic data, and respectively combining the data carrying the same omics data in the first characteristic data and the data carrying the same omics data in the second characteristic data. The merging of the multiple data may be to obtain a weighted sum of the multiple data, and a weight of each data in the multiple data may be determined according to actual needs. For example, when the multiple sets of mathematical data and interaction data are of the same importance for analyzing the medical event, the weight of the data in the first feature data and the weight of the data in the second feature data may both be 0.5.
Referring to fig. 8, in fig. 8, three groups of rectangular boxes filled with oblique lines with different densities represent first feature data obtained by performing a convolution operation on updated multi-set mathematical data, and in fig. 8, three groups of rectangular boxes filled with dots with different densities represent second feature data obtained by performing a convolution operation on multi-set mathematical data converted into a multi-dimensional data space. After the first characteristic data and the second characteristic data are subjected to characteristic fusion, the data carrying the same omics data in the first characteristic data and the second characteristic data are merged.
After the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space are subjected to feature fusion, the fused data simultaneously comprise the characteristics of the original multiple groups of mathematical data and the characteristics of the multiple groups of mathematical data fused into the interaction data, so that the fused data comprise multiple types of biological information, more comprehensive and more implicit biological information can be provided for the analysis process of the medical event, and the accuracy of analyzing the medical event is improved.
It should be noted that the implementation process of data fusion based on multiple sets of mathematical data and interaction data may include part or all of the above steps 6041 to 6043, and this embodiment of the present application is not particularly limited thereto.
For example, when the implementation process of data fusion based on multiple sets of mathematical data and interaction data only includes step 6042, that is, the implementation process of data fusion based on multiple sets of mathematical data and interaction data includes: the multiple sets of chemistry data are updated based on the interaction data. And updating the data after the multiple groups of the mathematical data based on the interaction data, namely the result of data fusion of the multiple groups of the mathematical data and the interaction data.
For another example, when the implementation process of data fusion based on multiple sets of mathematical data and interaction data includes step 6041 and step 6042, that is, the implementation process of data fusion based on multiple sets of mathematical data and interaction data includes: and converting the multiple groups of mathematical data into the same multidimensional data space, and updating the multiple groups of mathematical data converted into the multidimensional data space based on the interaction data. And updating the data converted into the multi-dimensional data space by the multiple groups of mathematical data based on the interaction data, namely, the result of data fusion of the multiple groups of mathematical data and the interaction data.
For another example, when the implementation process of data fusion based on multiple sets of mathematical data and interaction data includes step 6042 and step 6043, that is, the implementation process of data fusion based on multiple sets of mathematical data and interaction data includes: and updating the multiple groups of mathematical data based on the interaction data, and performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data. And performing feature fusion on the updated multiple groups of mathematical data and the updated multiple groups of mathematical data, namely obtaining the result of performing data fusion on the multiple groups of mathematical data and the interaction data.
Optionally, before analyzing the medical event by using the feature-fused data, a preset discarding strategy (e.g., Dropout Gating method) may be further performed on the feature-fused data according to a target of analyzing the medical event, so as to remove data that is not important for analyzing the medical event from the feature-fused data. In addition, before the medical event is analyzed by using the data after feature fusion, a preset weight strategy (for example, an Attention method) can be executed on the data after feature fusion according to a target for analyzing the medical event, so as to highlight the data which is more important for analyzing the medical event in the data after feature fusion, and further improve the accuracy of analyzing the medical event.
And after the multiple groups of chemical data and interaction data are fused according to the fusion sub model, fused analysis data are obtained, the output result of the fusion sub model can be continuously input into the analysis sub model, the analysis sub model performs further operations such as feature extraction, decoding, prediction and the like on the fused analysis data, and finally, the analysis result of the medical event is obtained. The analysis submodel may be an AI model with classification, prediction, or recognition capabilities, such as: the analytical model may include convolutional layers, fully connected layers, and the like.
As described above, the medical event analysis model may be used to analyze multiple sets of mathematical data and interaction data to obtain an analysis result of the medical event, and the medical event analysis model may be implemented by an AI model. Alternatively, the medical event analysis model may be an AI model with the input data being graph structure data, for example, the medical event analysis model may be a graph neural network and a graph convolution neural network.
The analysis model of the medical event may be trained using a plurality of sets of sample data of the mathematical data and sample data of the interaction data prior to analysis using the medical event analysis model. When a supervised training process is used to train an analytical model for a medical event, sample multigroup mathematical data and sample interaction data may also carry sample labels. The sample label carried in the sample data is used for indicating an expected analysis result which is output by the analysis model of the medical event aiming at the sample data after the sample data is input to the analysis model of the trained medical event. Correspondingly, the process of training the analysis model of the medical event is a process of continuously adjusting parameters such as a weight value and the like in the analysis model of the medical event according to the expected analysis result and the actual analysis result of the analysis model of the medical event.
In an implementation manner of training the medical event analysis model, a preset model with a determined model structure may be trained, and parameters such as a weight value of the preset model are adjusted according to a learning target in the training process, so as to obtain the medical event analysis model for analyzing multiple sets of mathematical data and interaction data. The preset model may be an AI model that is already in the industry and has a better predictive performance. The training process can comprise the following steps: initializing parameters such as weight values connected among neurons in a preset model, and inputting sample data carrying sample labels in a training sample set to the preset model. The method comprises the steps of obtaining an actual analysis result of a preset model on sample data carrying a sample label in a training sample set, obtaining an expected analysis result indicated by the sample label carried by the sample data carrying the sample label in the training sample set, and determining a first error between the actual analysis result and the expected analysis result. And then, adjusting parameters such as weight values connected between neurons in the preset model according to the first error until the first error determined according to the sample data carrying the sample labels in the training sample set reaches the minimum value or the training times reach the specified training times.
In another implementation manner of training the medical event analysis model, a model search method (e.g., a grid search method) may be used to search the structure of the candidate models, and the searched candidate models may be trained during the search process to adjust parameters such as a weight value of each candidate model, so as to obtain the medical event analysis model for analyzing multiple sets of mathematical data and interaction data. The training process can comprise the following steps: a plurality of candidate medical event analysis models are determined, and one or more model parameters of each two candidate medical event analysis models are different. And respectively training the multiple candidate medical event analysis models by using sample data carrying sample labels in the training sample set to obtain multiple trained candidate AI models. And determining the medical event analysis model for analyzing the medical event according to the candidate medical event analysis model with the highest accuracy indicated by the performance parameters in the plurality of trained candidate medical event analysis models.
Each model parameter can be regarded as a variable, a plurality of model parameter sets formed by each variable in different values are obtained, at least one difference variable exists in each two parameter sets, the values of each difference variable in the two parameter sets are different, and each model parameter set is used for defining the structure of an alternative medical event analysis model. Correspondingly, different values are given to the plurality of model parameters to obtain a process of a plurality of model parameter sets, namely the process of determining a plurality of candidate medical event analysis models. The model parameters may include one or more of: the method comprises the steps of determining the network structure type of the AI model, the number of network layers of the AI model, the number of neurons in each network layer, the connection mode between the neurons, the batch input number during the training of the model, the learning rate, a strategy for optimizing the learning rate, a discarding strategy for discarding data, a weighting strategy for executing data and the like.
Further, to demonstrate the accuracy of the methods of analysis of medical events provided in the examples herein, 24 of the 578 cancer samples were typed based on genomics data, and protein interaction networks of cancer samples in the cancer dataset pan-cancer profile disclosed in the cancer gene dataset. The analysis methods compared with the analysis method of the medical event provided by the embodiment of the application are respectively as follows: eXtreme gradient boosting (XGBoost) algorithm, Keras (an open source artificial neural network library) -based novel automatic machine learning (AutoKeras) algorithm, automatic genome (AutoGenome) algorithm, automatic omics (automics) algorithm, and graph deep learning-simplex (GDL-Single) algorithm.
The XGboost algorithm, the AutoKeras algorithm, the AutoGenome algorithm and the GDL-Single algorithm are used for cancer typing according to Single-group data, and the AutoOmics algorithm is used for cancer typing according to multiple-group data. The accuracy of the XGboost algorithm, the AutoKeras algorithm, the Autogenome algorithm, the AutoOmics algorithm and the GDL-Single algorithm in typing is 0.911, 0.910, 0.963, 0.972 and 0.968 respectively. The accuracy of the medical event analysis method provided by the embodiment of the application can reach 0.982, so that the accuracy of analysis is effectively improved by the medical event analysis method provided by the embodiment of the application.
In summary, the method for analyzing a medical event according to the embodiment of the present application obtains an analysis result of the medical event by obtaining multiple sets of mathematical data and interaction data of a living body and analyzing the multiple sets of mathematical data and interaction data. Because the interaction data represents the interaction relationship between multiple groups of biological data or between internal components of single omics data, the method for analyzing the medical event provided by the embodiment of the application not only considers the multiple groups of biological data, but also considers the interaction relationship between the multiple groups of biological data or between internal components of single omics data, so that the data for analyzing the medical event can better express the characteristics of the biological body, and the accuracy of analyzing the medical event is effectively improved.
The sequence of steps of the method for analyzing the medical events provided by the embodiment of the application can be properly adjusted, and the steps can be correspondingly increased or decreased according to the situation. For example, when analyzing the medical event, some or all of steps 6041, 6042 and 6043 may be selected and executed according to actual requirements. Any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application is covered by the protection scope of the present application, and thus the detailed description thereof is omitted.
The application provides an analysis device for medical events. As shown in fig. 1, the medical event analysis device 10 may include:
the acquisition module 101 is configured to acquire multiple sets of mathematical data and interaction data of a biological body, where the interaction data represents an interaction relationship between the multiple sets of mathematical data or between internal components of a single omic data of the biological body.
The analysis module 102 is configured to analyze the multiple sets of mathematical data and interaction data to obtain an analysis result of the medical event.
Optionally, the multi-genomic data of the organism comprises any one or more of the following: gene mutation data, gene expression data, deoxyribonucleic acid methylation data, copy number variation data, microribonucleic acid expression data, histone modification data, gene first fusion data, chromosome isomerism data and metabolite expression data.
Optionally, the medical event comprises any one or more of the following events: analyzing the sensitivity of an organism to a drug, analyzing the susceptibility of an organism to gene interference, analyzing the type of disorder the organism suffers from, analyzing the species the organism belongs to, analyzing the causative gene of the disorder the organism suffers from, analyzing the type of organism, and analyzing the remaining life time of the organism.
Optionally, as shown in fig. 9, the analysis module 102 includes:
a fusion sub-module 1021 for performing data fusion based on the plurality of sets of mathematical data and interaction data.
The analysis submodule 1022 is configured to analyze the data after data fusion, and obtain a medical event analysis result.
Optionally, the fusion submodule 1021 is specifically configured to: converting multiple sets of mathematical data into the same multidimensional data space, wherein each set of dimensional data in the multidimensional data space is used for representing one aspect of factors influencing the analysis result of the medical event; updating the plurality of sets of mathematical data converted into the multidimensional data space based on the interaction data; and performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space.
Optionally, as shown in fig. 10, the apparatus 10 for analyzing a medical event further includes:
and the preprocessing submodule 103 is configured to screen mutually used data based on the confidence of the interaction relationship in the interaction data, so as to obtain updated interaction data.
Alternatively, the interaction data analyzed is represented using a contiguous matrix or graphical structure.
And/or, the interaction data comprises any one or more of the following: protein interaction networks, gene regulation networks, gene co-expression networks, and metabolic networks.
Optionally, the analysis module 102 is specifically configured to: and analyzing the multiple groups of mathematical data and the interaction data by adopting an AI model to obtain an analysis result of the medical event, wherein the AI model is obtained by training sample data of the multiple groups of mathematical data and sample data of the interaction data.
In summary, the analysis apparatus for medical events provided in the embodiment of the present application obtains multiple sets of mathematical data and interaction data of a living body through the obtaining module, and the analysis module analyzes the multiple sets of mathematical data and interaction data to obtain an analysis result of the medical event. Because the interaction data represents the interaction relationship between multiple groups of biological data or between internal components of single omics data, the method for analyzing the medical event provided by the embodiment of the application not only considers the multiple groups of biological data, but also considers the interaction relationship between the multiple groups of biological data or between internal components of single omics data, so that the data for analyzing the medical event can better express the characteristics of the biological body, and the accuracy of analyzing the medical event is effectively improved.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The present application also provides a computing device comprising a processor and a memory; the memory has stored therein a computer program; when the processor executes the computer program, the computing device executes the method for analyzing a medical event provided by the present application. The structure of the computing device may refer to the structure of the computing device shown in fig. 4.
The present application also provides a computer-readable storage medium, which may be a non-transitory readable storage medium, and when instructions in the computer-readable storage medium are executed by a computer, the computer is configured to perform the method for analyzing a medical event provided by the present application. The computer readable storage medium includes, but is not limited to, volatile memory such as random access memory, non-volatile memory such as flash memory, Hard Disk Drive (HDD), Solid State Drive (SSD).
The present application also provides a computer program product comprising computer instructions which, when executed by a computing device, the computing device performs the method of analyzing a medical event as provided herein.
Furthermore, the computer program product may be a software installation package, and in the case that the analysis method for medical events provided by the embodiment of the present application needs to be used, the computer program product may be downloaded and executed on a computing device.
Embodiments of the present application further provide a chip, where the chip includes a programmable logic circuit and/or program instructions, and when the chip is run, the chip is configured to implement the method for analyzing a medical event provided in an embodiment of the present application.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
In the embodiments of the present application, the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "one or more" refers to one or more, and the term "plurality" refers to two or more, unless expressly defined otherwise.
The term "and/or" in this application is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The present application is intended to cover various modifications, equivalents, improvements, and equivalents of the embodiments described above, which may fall within the spirit and scope of the present application.

Claims (18)

1. A method of analyzing a medical event, comprising:
obtaining a plurality of sets of chemical data and interaction data of an organism, wherein the interaction data represents interaction relations among the plurality of sets of chemical data of the organism or among internal components of single chemical data;
and analyzing the multiple groups of chemical data and the interaction data to obtain an analysis result of the medical event.
2. The method of claim 1, wherein the plurality of sets of biological data comprises any one or more of: gene mutation data, gene expression data, deoxyribonucleic acid methylation data, copy number variation data, microribonucleic acid expression data, histone modification data, gene first fusion data, chromosome isomerism data and metabolite expression data.
3. The method of claim 1 or 2, wherein the medical event comprises any one or more of the following events: analyzing the organism's sensitivity to drugs, analyzing the organism's susceptibility to gene interference, analyzing the type of condition from which the organism suffers, analyzing the species to which the organism belongs, analyzing the causative gene of the condition from which the organism suffers, analyzing the type of organism, and analyzing the organism's remaining life time.
4. The method according to any one of claims 1 to 3, wherein said analyzing said plurality of sets of chemical data and said interaction data to obtain said medical event analysis result comprises:
performing data fusion based on the plurality of sets of mathematical data and the interaction data;
and analyzing the data after data fusion to obtain the medical event analysis result.
5. The method of claim 4, wherein the data fusing based on the plurality of sets of mathematical data and the interaction data comprises:
transforming the multiple sets of mathematical data into the same multidimensional data space, each dimension of data in the multidimensional data space being used to represent an aspect of a factor affecting the analysis of the medical event;
updating the plurality of sets of mathematical data converted into the multidimensional data space based on the interaction data;
and performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space.
6. The method of any one of claims 1 to 5, wherein prior to said analyzing said plurality of sets of chemical data and said interaction data, said method further comprises:
and screening the data used for interaction based on the confidence coefficient of the interaction relation in the interaction data to obtain updated interaction data.
7. The method according to any one of claims 1 to 6, characterized in that the interaction data analyzed is represented in a contiguous matrix or graphical structure;
and/or, the interaction data comprises any one or more of the following: protein interaction networks, gene regulation networks, gene co-expression networks, and metabolic networks.
8. The method of any one of claims 1 to 7, wherein said analyzing said plurality of sets of mathematical data and said interaction data to obtain an analysis of said medical event comprises:
and analyzing the multiple groups of mathematical data and the interaction data by adopting an AI model to obtain an analysis result of the medical event, wherein the AI model is obtained by training sample data of the multiple groups of mathematical data and sample data of the interaction data.
9. An apparatus for analyzing a medical event, comprising:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring multiple groups of chemical data and interaction data of an organism, and the interaction data represents interaction relations among the multiple groups of chemical data of the organism or among internal components of single omics data;
and the analysis module is used for analyzing the multiple groups of chemical data and the interaction data to obtain an analysis result of the medical event.
10. The apparatus of claim 9, wherein the plurality of sets of biological data comprises any one or more of: gene mutation data, gene expression data, deoxyribonucleic acid methylation data, copy number variation data, microribonucleic acid expression data, histone modification data, gene first fusion data, chromosome isomerism data and metabolite expression data.
11. The apparatus of claim 9 or 10, wherein the medical event comprises any one or more of: analyzing the organism's sensitivity to drugs, analyzing the organism's susceptibility to gene interference, analyzing the type of condition from which the organism suffers, analyzing the species to which the organism belongs, analyzing the causative gene of the condition from which the organism suffers, analyzing the type of organism, and analyzing the organism's remaining life time.
12. The apparatus of any one of claims 9 to 11, wherein the analysis module comprises:
a fusion submodule for performing data fusion based on the plurality of sets of mathematical data and the interaction data;
and the analysis submodule is used for analyzing the data after data fusion to obtain the analysis result of the medical event.
13. The apparatus of claim 12, wherein the fusion submodule is specifically configured to:
transforming the multiple sets of mathematical data into the same multidimensional data space, each dimension of data in the multidimensional data space being used to represent an aspect of a factor affecting the analysis of the medical event;
updating the plurality of sets of mathematical data converted into the multidimensional data space based on the interaction data;
and performing feature fusion on the updated multiple groups of mathematical data and the multiple groups of mathematical data converted into the multidimensional data space.
14. The apparatus of any one of claims 9 to 13, further comprising:
and the preprocessing submodule is used for screening the data used for interaction based on the confidence coefficient of the interaction relation in the interaction data to obtain updated interaction data.
15. The apparatus according to any one of claims 9 to 14, wherein the interaction data being analyzed is represented in a contiguous matrix or graphical structure;
and/or, the interaction data comprises any one or more of the following: protein interaction networks, gene regulation networks, gene co-expression networks, and metabolic networks.
16. The apparatus according to any one of claims 9 to 15, wherein the analysis module is specifically configured to:
and analyzing the multiple groups of mathematical data and the interaction data by adopting an AI model to obtain an analysis result of the medical event, wherein the AI model is obtained by training sample data of the multiple groups of mathematical data and sample data of the interaction data.
17. A computing device, wherein the computing device comprises a processor and a memory;
the memory has stored therein a computer program;
the computer program, when executed by the processor, causes the computing device to perform the method of analyzing a medical event of any of the preceding claims 1 to 8.
18. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a computer, cause the computer to perform the method of analyzing a medical event of any of the preceding claims 1 to 8.
CN202010627570.9A 2020-07-02 2020-07-02 Medical event analysis method and device, computer equipment and storage medium Pending CN113889181A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010627570.9A CN113889181A (en) 2020-07-02 2020-07-02 Medical event analysis method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010627570.9A CN113889181A (en) 2020-07-02 2020-07-02 Medical event analysis method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113889181A true CN113889181A (en) 2022-01-04

Family

ID=79012480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010627570.9A Pending CN113889181A (en) 2020-07-02 2020-07-02 Medical event analysis method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113889181A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231203A1 (en) * 2022-05-31 2023-12-07 医渡云(北京)技术有限公司 Drug efficacy prediction method and apparatus based on digital cell model, medium, and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023231203A1 (en) * 2022-05-31 2023-12-07 医渡云(北京)技术有限公司 Drug efficacy prediction method and apparatus based on digital cell model, medium, and device

Similar Documents

Publication Publication Date Title
Ching et al. Opportunities and obstacles for deep learning in biology and medicine
Pirim et al. Clustering of high throughput gene expression data
Azadifar et al. Graph-based relevancy-redundancy gene selection method for cancer diagnosis
Zeng et al. Review of statistical learning methods in integrated omics studies (an integrated information science)
Zhang et al. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types
Vlasblom et al. Markov clustering versus affinity propagation for the partitioning of protein interaction graphs
Ressom et al. Increasing the efficiency of fuzzy logic-based gene expression data analysis
Kim et al. Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization
Nepomuceno-Chamorro et al. Inferring gene regression networks with model trees
CN116741397B (en) Cancer typing method, system and storage medium based on multi-group data fusion
Ghadiri et al. BigFCM: Fast, precise and scalable FCM on hadoop
Tang et al. Explainable multi-task learning for multi-modality biological data analysis
Cheng et al. DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data
Alzubaidi et al. A novel deep mining model for effective knowledge discovery from omics data
Medina-Ortiz et al. Dmakit: A user-friendly web platform for bringing state-of-the-art data analysis techniques to non-specific users
Eicher et al. Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
Yu et al. Protein complexes detection based on node local properties and gene expression in PPI weighted networks
Azadifar et al. A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning
Qattous et al. PaCMAP-embedded convolutional neural network for multi-omics data integration
CN113889181A (en) Medical event analysis method and device, computer equipment and storage medium
CN112466401B (en) Method and device for analyzing multiple types of data by utilizing artificial intelligence AI model group
Nimitha et al. An improved deep convolutional neural network architecture for chromosome abnormality detection using hybrid optimization model
Hulot et al. A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
Fratello et al. Unsupervised algorithms for microarray sample stratification
Gruca et al. Rule based functional description of genes–estimation of the multicriteria rule interestingness measure by the UTA method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220211

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination