WO2022111385A1 - 基于图神经网络的临床组学数据处理方法、装置、设备及介质 - Google Patents
基于图神经网络的临床组学数据处理方法、装置、设备及介质 Download PDFInfo
- Publication number
- WO2022111385A1 WO2022111385A1 PCT/CN2021/131652 CN2021131652W WO2022111385A1 WO 2022111385 A1 WO2022111385 A1 WO 2022111385A1 CN 2021131652 W CN2021131652 W CN 2021131652W WO 2022111385 A1 WO2022111385 A1 WO 2022111385A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- omics
- node
- features
- feature
- data
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 87
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 238000004458 analytical method Methods 0.000 claims abstract description 144
- 201000010099 disease Diseases 0.000 claims abstract description 76
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 76
- 238000000034 method Methods 0.000 claims abstract description 50
- 230000004083 survival effect Effects 0.000 claims abstract description 24
- 238000003745 diagnosis Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 36
- 238000003062 neural network model Methods 0.000 claims description 32
- 238000000605 extraction Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 12
- 230000000875 corresponding effect Effects 0.000 description 138
- 230000006870 function Effects 0.000 description 22
- 238000005516 engineering process Methods 0.000 description 17
- 239000011159 matrix material Substances 0.000 description 17
- 230000019491 signal transduction Effects 0.000 description 14
- 108090000623 proteins and genes Proteins 0.000 description 13
- 238000010801 machine learning Methods 0.000 description 10
- 230000001105 regulatory effect Effects 0.000 description 8
- 238000011161 development Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000004927 fusion Effects 0.000 description 5
- 230000033228 biological regulation Effects 0.000 description 4
- 239000000090 biomarker Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 4
- 230000004186 co-expression Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 102000004169 proteins and genes Human genes 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000010195 expression analysis Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 102000037983 regulatory factors Human genes 0.000 description 3
- 108091008025 regulatory factors Proteins 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000031018 biological processes and functions Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000002705 metabolomic analysis Methods 0.000 description 2
- 230000001431 metabolomic effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003950 pathogenic mechanism Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 210000003850 cellular structure Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 210000004895 subcellular structure Anatomy 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- the present application relates to the technical fields of medical treatment, artificial intelligence, cloud data, etc.
- the present application relates to a graph neural network-based omics data processing method, device, equipment and medium.
- omics gene expression and protein expression at different stages of the life cycle and disease development
- omics is a systematic study.
- omics can also reflect the life cycle stage of the body and the development of diseases, omics data plays a crucial role in medical care.
- An embodiment of the present application provides a method for processing clinical omics data based on a graph neural network, the method comprising:
- a first graph structure corresponding to the first omics data is constructed, wherein the first graph structure includes at least two nodes, and each node represents the first graph structure.
- a first omics feature in the omics data the first graph structure includes at least one edge connecting the at least two nodes, and the edge represents the first correlation corresponding to the two connected nodes;
- the node feature of each node in the first graph structure is obtained, and the node feature has at least one dimension;
- the medical analysis includes disease diagnosis, disease classification and survival prediction for the target object ;
- the medical analysis result includes the probability of the target object suffering from the disease corresponding to each dimension, the probability that the disease of the target object corresponding to each dimension is a certain disease category, and the survival probability of the target object corresponding to each dimension.
- an embodiment of the present application provides a graph neural network-based clinical omics data processing device, the device comprising:
- a data acquisition module for acquiring the first omics data of the target object, and extracting at least two first omics features from the first omics data;
- a correlation determination module for determining a first correlation between different omics features in the at least two first omics features
- a graph structure building module configured to construct a first graph structure corresponding to the first omics data based on the at least two first omics features and the first correlation, wherein the first graph structure includes at least two nodes, and each Each node represents a first omics feature in the first omics data, the first graph structure includes at least one edge connecting the at least two nodes, and the edge represents the two connected nodes the corresponding first correlation;
- the node feature determination module is used to obtain the node feature of each node in the first graph structure through the first graph neural network based on the first graph structure, and the node feature has at least one dimension;
- an analysis result determination module configured to perform a medical analysis on the target object based on the node characteristics of each node, and obtain a medical analysis result corresponding to each dimension in the at least one dimension;
- the medical analysis includes performing a disease diagnosis on the target object , disease classification and survival prediction;
- the medical analysis results include the probability of the target object corresponding to each dimension suffering from the disease, the probability that the disease of the target object corresponding to each dimension is a certain disease category, and the corresponding probability of each dimension. Describe the survival probability of the target object.
- an embodiment of the present application provides an electronic device, including a processor and a memory: the memory is configured to store a computer program, and when the computer program is executed by the processor, the processor causes the processor to execute the graph neural network-based method described above. Methods of omics data processing.
- the embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and when the computer program runs on a computer, the computer can execute the graph neural network-based group in the above. Learn how to process data.
- 1a is a schematic flowchart of a graph neural network-based omics data processing method provided by an embodiment of the application;
- Fig. 1b is a specific flowchart of obtaining the node features of each node in the first graph structure through the first graph neural network based on the first graph structure in step S104 of the embodiment of the application;
- FIG. 1c is a specific flowchart of obtaining the second feature of at least one level of the node for each node in the first graph structure according to an embodiment of the present application;
- Fig. 1d is a flowchart of a method for processing clinical omics data based on a graph neural network according to an embodiment of the present application
- 2a is a schematic diagram of the principle of a graph neural network-based omics data processing method provided by an embodiment of the application;
- FIG. 2b is a schematic diagram of an edge matrix provided by an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a graph neural network-based omics data processing device provided by an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the omics analysis method based on machine learning specifically includes: firstly obtaining the omics features of the sample, and at the same time obtaining the sample category labels of the omics features of the samples calibrated by doctors, and then dividing all the omics features of the samples into a training set according to a certain proportion, Validation set and test set; the omics features of the samples in the training set are used as input, and the corresponding sample category labels are used as supervision signals to train the model, and the optimal parameters of the model are screened according to the performance of the validation set to obtain the final model, and then based on The final model makes disease predictions on omics data.
- the omics data analysis method based on machine learning has the following shortcomings:
- proteomics research has irreplaceable advantages for disease diagnosis, typing and prediction, ignoring The study of proteomics is a major obstacle to the realization of precision medicine.
- the embodiments of the present application provide a graph neural network-based omics data processing method, apparatus, device, and medium, which aim to solve some or all of the technical problems described above.
- the omics data can be processed based on artificial intelligence technology to obtain a corresponding medical prediction result.
- the features of each omics feature in the omics data to be processed can be obtained based on the machine learning technology in the artificial intelligence technology, and then the final medical prediction result can be obtained based on the features of each omics feature.
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
- the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- Machine Learning is a multi-domain interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance.
- Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence.
- Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other technologies.
- the data processing/calculation involved in the embodiments of the present application may be performed based on cloud computing.
- cloud computing refers to the delivery and use mode of IT infrastructure, which refers to obtaining the required resources in an on-demand and easily scalable manner through the network;
- cloud computing in a broad sense refers to the delivery and use mode of services, which refers to the network to Get the services you need in an on-demand and easily scalable way.
- services can be IT and software, Internet-related, or other services.
- Cloud computing is grid computing (Grid Computing), distributed computing (Distributed Computing), parallel computing (Parallel Computing), utility computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load Balance) Balance) and other traditional computer and network technology development and integration products.
- Cloud computing has grown rapidly with the development of the Internet, real-time data streaming, the diversity of connected devices, and the need for search services, social networking, mobile commerce, and open collaboration. Different from the parallel distributed computing in the past, the emergence of cloud computing will promote revolutionary changes in the entire Internet model and enterprise management model.
- Omics It is an important tool for systematically studying biological laws, mainly including Genomics, Proteomics, Metabolomics, Transcriptomics, lipids Lipidomics, Immunomics, Radiomics, Ultrasomics, etc.
- omics features related features of various omics that can reflect biological laws.
- Biomarker Refers to biochemical indicators that can mark changes in system, organ, tissue, cell and subcellular structure or function or possible changes. It has a very wide range of uses and can be used for disease diagnosis and disease staging. Or used to evaluate the safety and efficacy of new drugs or new therapies in the target population.
- Signaling pathway refers to the phenomenon that when a certain reaction is going to occur in the cell, the signal transmits a kind of information from outside the cell to the inside of the cell, and the cell should respond according to this kind of information.
- omics feature interacts with other omics features when performing functions, the omics feature and other omics features will constitute a signaling pathway.
- the method provided by the implementation of this application is performed by an electronic device, and the electronic device may be a server or a terminal device. Specifically, the method provided by the implementation of this application may be executed based on data interaction between a terminal device or a server and a terminal device.
- the server may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server that provides cloud computing services.
- the terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto.
- the terminal device and the server can be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
- the terminal device can first send the omics data to be processed to the server, and the server performs medical analysis on the received omics data to obtain the medical analysis.
- the result is returned to the terminal device, and the terminal device provides the medical analysis result to the user.
- Fig. 1a shows a schematic flowchart of a graph neural network-based omics data processing method provided in an embodiment of the present application. As shown in Fig. 1a, the method includes:
- Step S101 acquiring first omics data of a target object, and extracting at least two first omics features from the first omics data.
- the first omics data of the target object refers to the omics data that needs to be medically analyzed, the first omics data includes at least two kinds of first omics features, and each of the included first omics features belongs to the same Categories, such as Genomics, belong to the same target object, but are different from each other.
- the category of the target object is not limited in this embodiment of the present application, for example, the target object may be a human being, or an animal, or the like.
- Step S102 determining a first correlation between different first omics features in the at least two first omics features.
- the first correlation between different first omics features can be determined, and then based on the first correlation between the first omics features, the first omics functions that perform similar functions can be determined. features are associated.
- the correlation matrix between different omics features can be calculated by the Weighted Gene Co-Expression Network Analysis (WGCNA), Then, the correlation matrix can be binarized by setting a threshold, and the binarized correlation matrix is called an edge matrix.
- WGCNA Weighted Gene Co-Expression Network Analysis
- the correlation matrix can be binarized by setting a threshold, and the binarized correlation matrix is called an edge matrix.
- the threshold when the correlation between the two first omics features is not less than the threshold, it means that the two first omics features perform similar functions and interact with each other (that is, they can form a signal pathway).
- the value of the element in the correlation matrix that characterizes the correlation between the two first omics features is set to 1, and when the correlation between the two first omics features is less than the threshold, the two If the correlation between the first omics features is low, the value of the element in the correlation matrix that characterizes the correlation between the two first omics features is set to 0.
- calculating the correlation between different first omics features by means of WGCNA can make the first omics features that perform similar functions have higher correlation; further, after obtaining different, After the correlation matrix between the first omics features is obtained, the correlation matrix can be binarized, so that the correlation between different first omics features can be better highlighted.
- Step S103 constructing a first graph structure corresponding to the first omics data based on the at least two first omics features and the first correlation, wherein the first graph structure includes at least two nodes, and each node represents the A first omics feature in the first omics data, the first graph structure includes at least one edge connecting the at least two nodes, and the edge represents two types of two nodes corresponding to the connected nodes The first correlation between the first omics features.
- the graph structure includes each node and the edges connecting the nodes.
- each node in the graph structure represents a first omics feature
- the edge connection between the two nodes represents the The first correlation between the two first omics features corresponding to the two nodes.
- each node included in the first graph structure can be obtained according to the first omics features included in the first omics data, and then according to different first omics features.
- the first correlation between the omics features determines which two nodes in the first graph structure are to be connected with, so as to obtain the first graph structure corresponding to the first omics data.
- a first graph structure corresponding to the first omics data is constructed based on at least two first omics features and each first correlation, including:
- any two first omics features if the first correlation between the two first omics features is greater than or equal to the set value, then establish between the two nodes corresponding to the two first omics features even side.
- the first correlation between different first omics features is known, for any two first omics features in the first omics data, if the two first omics features are determined.
- the first correlation between the features is greater than or equal to the set value, it means that the two first omics features perform similar functions, and the correlation between them is high.
- An edge is established between the two nodes corresponding to the first omics feature.
- the first correlation between different first omics features can be represented based on the above-mentioned edge matrix
- the first graph structure corresponding to the first omics data for any two first omics data
- a set of omics features if the element value of the first correlation between the two first omics features in the edge matrix is 1, then it can be found between the two nodes corresponding to the two first omics features. If the element value of the first correlation between the two first omics features in the edge matrix is 0, then there is no connection between the two nodes corresponding to the two first omics features. Create connections.
- omics features that perform similar functions can be connected in the graph, which can not only reflect a single omics feature, but also reflect different groups
- the relationship between the biological characteristics can better reveal the pathogenic mechanism and realize the simulation of the biological process, so as to obtain a more accurate disease prediction effect.
- Step S104 based on the first graph structure, through the first graph neural network, obtain the node feature of each node in the first graph structure.
- the node feature has at least one dimension.
- the node feature when the dimension of the node feature is multiple dimensions, the node feature may be a sequence or an array.
- the first graph neural network is a graph neural network corresponding to the omics to which the first omics data belongs, and the specific type of the first graph neural network can be pre-configured, for example, the graph neural network can be a graph volume based on an attention mechanism Graph Attention Network (GAT), or other graph neural networks, such as graph convolutional networks, graph self-encoder networks, etc., which are not limited in the embodiments of this application.
- GAT attention mechanism Graph Attention Network
- the feature of each first omics feature when the first graph structure corresponding to the first omics data is obtained, the feature of each first omics feature, that is, the first graph, can be obtained through the graph neural network corresponding to the first omics data. Node characteristics of each node in the structure.
- the method further includes:
- Fig. 1b shows a specific flowchart of obtaining the node features of each node in the first graph structure through the first graph neural network based on the first graph structure in step S104.
- step S104 specifically includes:
- Step S1041 for each node in the first graph structure, obtain at least one level of the node based on the node in the first graph structure and each target node that has an edge relationship with the node by the first graph neural network the second characteristic;
- Step S1042 for each node, fuse the first feature corresponding to the node with each second feature to obtain the node feature of the node.
- a first feature for characterizing each node ie, extracting a feature for characterizing each first omics feature itself
- each target node that has an edge relationship with the node can be determined, and then the first feature of the node is determined by the first graph neural network based on the first feature of each target node and the first feature of the node.
- the first graph neural network may include at least one feature extraction layer (eg GAT layer), the output of each feature extraction layer corresponds to a second feature, wherein the input of the first feature extraction layer is the first feature of each node in the first graph structure, and the connection relationship between each node in the first graph structure, the input of other feature extraction layers except the first feature extraction layer is the previous feature extraction layer The corresponding second feature of each node, and the edge connection relationship between each node.
- the feature extraction layer may be a graph convolution (GAT) layer based on an attention mechanism.
- the first feature corresponding to the node and at least one second feature may be fused, and the fused feature may be fused.
- medical analysis is performed based on the node feature of each node to obtain a corresponding medical analysis result.
- the first feature and at least one second feature of each node when the first feature and at least one second feature of each node are fused, the first feature and at least one second feature can be mapped to the same node dimension through the fully connected layers connected respectively, and after each mapping is obtained Then, the mapped features are fused by splicing, and the fused features are used as the node features of each node.
- the second feature of each node is obtained based on the feature fusion of the nodes constituting the signal path
- the second feature of each node obtained at this time is fused with the omics features of other nodes (that is, the first
- the second feature is the feature of the signal pathway level);
- the node feature of each node obtained by fusing the first feature and the second feature of each node, that is, the feature of a single omics feature level is also included (that is, the first feature features), and also includes features at the signal pathway level (ie, second features), which can better characterize the omics features corresponding to the first omics data of the target object, so that when performing medical analysis based on the first omics data, The obtained analysis results can be more accurate.
- FIG. 1c shows that for each node in the first graph structure of the embodiment of the present application, the first graph neural network is based on the node in the first graph structure and has connections with the node.
- the specific flowchart of obtaining the second feature of at least one level of the node includes:
- Step S1043 obtaining initial features of each node of the first graph structure, wherein, when determining the second feature of the first level of each node, the initial feature of each node is the first feature corresponding to each node; When determining the second feature of any level other than the feature of the first level, the initial feature of each node is the second feature of the previous level of the level;
- Step S1044 for each node, based on each associated feature of the node, determine the weight of each associated feature through the first graph neural network, wherein each associated feature includes the initial feature of the node and the edge relationship with the node.
- Step S1045 for each node, based on the weight of each associated feature of the node, weighted and fused each associated feature of the node through the first graph neural network to obtain a second feature of a level of the node;
- the second feature of any level except the second feature of the first level is obtained based on the second feature of the previous level of the level.
- the initial feature of each node may be determined, wherein, if a node corresponds to the second feature of at least two levels, then the features other than the feature of the first level The second feature of any level is obtained based on the second feature of the previous level of the level, that is, when determining the second feature of the first level of each node, the initial feature of each node is the first level corresponding to each node. feature, and when determining the second feature of any level other than the feature of the first level, the initial feature of each node is the second feature of the previous level of the level.
- any node if there is an edge between the node and a node, it means that the first omics feature corresponding to the node performs similar functions with the first omics feature corresponding to the node, which can form a signal path. , but the importance of each node in performing the function is different. At this time, the importance of each node in performing the function can be represented by the weight.
- each target node that is connected to the node may be determined, and then based on the initial characteristics of the node and the initial characteristics of each target node corresponding to the node (that is, each target node of the node) associated features), determine the weight of each associated feature through the graph convolution network (that is, determine the initial feature of the node and the weight of the initial feature of each target node corresponding to the node), and then can be based on the initial feature of the node, and The weights corresponding to the initial features of each target node corresponding to the node weight the initial features of the node and the initial features of each target node to obtain the weighted initial features, and then fuse the weighted initial features, And the fused feature is fused as the second feature of a level of the node.
- the first graph neural network includes two GAT layers
- the first graph structure includes 3 nodes (node 1 to node 3), and node 1 is connected to node 2 and node 3 respectively, and node 2 And node 3 only has an edge with node 1.
- the first features of nodes 1 to 3 can be determined respectively.
- the target nodes of the node can be determined as node 2 and node 3, and then based on the first feature of node 1 (ie, the initial feature of node 1) , and the first features corresponding to nodes 2 and 3 (that is, the initial features of nodes 2 and 3), through the first GAT layer in the graph convolution network, determine the weights of the first features of nodes 1 to 3 , and then through the first GAT layer, according to the corresponding weights of the first features of nodes 1 to 3, the first features of nodes 1 to 3 are weighted and fused respectively, and the second level of the first level of node 1 is obtained.
- the first feature of node 1 ie, the initial feature of node 1
- the first features corresponding to nodes 2 and 3 that is, the initial features of nodes 2 and 3
- the second feature of the first level of node 2 and node 3 can be obtained; further, for node 1, the second feature of the first level of node 1 to node 3 can be used as the association of node 1 feature, and then determine the weight of each associated feature through the graph convolution network, and then based on the weight of each associated feature of the node, through the second GAT layer of the graph convolution network
- the associated features of node 1 of the node are weighted and fused,
- the second feature of the second level of node 1 is obtained, and similarly, the second feature of the second level of node 2 and node 3 can be obtained respectively.
- the omics features of similar functions (the omics features constituting the signaling pathway) can be combined ) are fused together, and the second feature obtained at this time is the feature of the signal pathway level, which can realize more attention to the advanced regulatory factors.
- Step S105 based on the node characteristics of each node, perform medical analysis on the target object to obtain medical analysis results corresponding to each dimension in the at least one dimension, and the medical analysis includes performing disease diagnosis and disease classification on the target object. type and survival prediction; the medical analysis results include the probability that the target object has a disease corresponding to each dimension, the probability that the disease of the target object corresponding to each dimension is a certain disease category, and the target object corresponding to each dimension. the probability of survival.
- medical analysis may be performed based on the features of each node to obtain corresponding medical analysis results.
- the categories specifically included in the medical analysis may be pre-configured, which is not limited in the embodiment of the present application.
- at least one of disease identification, disease classification, or survival prediction may be performed based on the node characteristics of each node; at this time, the obtained
- the medical analysis results may include at least one of disease identification results, disease classification results, or survival prediction results.
- each omics feature map in the omics data can be structured according to the correlation between different omics features, so that each group in biology can be effectively simulated.
- the correlation and regulation relationship between the omics features can better represent the state of the omics features; correspondingly, based on the graph structure, the node features of each node in the graph structure can be obtained through the graph neural network, and then based on the graph structure, the node features of each node in the graph structure can be obtained.
- the corresponding medical analysis results can be obtained from the node features of the graph.
- the graph-structured omics features can effectively simulate the correlation and regulation relationship between the omics features in biology, at this time, based on the graph structure, through the graph neural network
- the obtained node features of each node integrate the features of other nodes, belong to the comprehensive features of the signal pathway level, and can reflect the correlation and regulatory relationship between the various omics features.
- the medical analysis results obtained from the node characteristics of the node will be more accurate.
- Fig. 1d shows a flowchart of a method for processing clinical omics data based on a graph neural network according to an embodiment of the present application. As shown in Figure 1d, the method further includes:
- Step S106 obtain at least one second omics data, different omics data in the first omics data and at least one second omics data belong to different omics, and at least one second omics data and the first omics data The data belong to the same target object;
- Step S107 extracting data features corresponding to each second omics data
- Step S1051 Determine the medical analysis result of the target object based on the node feature of each node and the data feature corresponding to each second omics data.
- each second omics data and the first omics data belong to the same target object, and each second omics data and the first omics data belong to different categories of omics, for example, the first omics data is Genomics, the second omics data is proteomics, and both the second omics data and the first omics data belong to Person A.
- the first omics data and each second omics data are used when determining the medical analysis result.
- the importance of the data may be different.
- different weights can be set to represent the importance of the first omics data and each second omics data in determining the medical analysis results; correspondingly, when determining the medical analysis results, Then, based on the corresponding weights of the first omics data and each second omics data, the data features corresponding to each second omics data and the node features of each node can be weighted and fused to obtain the fused features, and then Based on the fused features, a medical analysis result of the target object is determined.
- each second omics data includes at least two second omics features
- the node features of each node in the first graph structure are obtained, including:
- the node features of each node of the first graph structure are obtained through the first graph neural network corresponding to the omics to which the first omics data belongs;
- the node characteristics of each node corresponding to the second omics data are obtained, so as to obtain the second omics data
- the data feature corresponding to the data the data feature includes the node feature of each node corresponding to the second omics data.
- the second omics data includes at least two second omics features, and the second omics features and the first omics features belong to different categories of omics.
- the graph neural network corresponding to each omics can be preconfigured, for example, the graph neural network corresponding to genomics, the graph neural network corresponding to proteomics, etc. can be preconfigured.
- the corresponding graph neural network is trained based on the omics features of different types of samples, and the network parameters of the graph neural network corresponding to each omics are different.
- the node features of each node of the first graph structure can be obtained based on the first graph neural network corresponding to the omics to which the first omics data belongs. ;
- the second correlation between different second omics features in at least two second omics features included in the second omics data can be determined, and then according to The second correlation between the second omics features and different second omics features, and the second graph structure corresponding to the second omics data is constructed.
- a node in the second graph structure represents a second group
- the connecting edge in the second graph structure represents the second correlation between the two second omics features corresponding to the two nodes of the connecting edge; further, based on the second graph structure, through the first
- the second graph neural network corresponding to the omics to which the second omics data belongs obtains the node features of each node in the second graph structure, that is, the data features corresponding to the second omics data.
- the node features of each node in the first graph structure are obtained, and the medical analysis result is obtained based on the node features of each node, is obtained through the analysis result prediction model, wherein the analysis result prediction model is obtained by training the initial neural network model based on the omics data of each sample.
- the omics data of each sample and the initial neural network model may be obtained, and then the initial neural network model is trained based on the obtained omics data of each sample to obtain an analysis result prediction model.
- the first graph structure corresponding to the first omics data can be determined first, and then the first graph structure corresponding to the first omics data can be input into the analysis result prediction model, the The analysis result prediction model can be based on the first graph structure, obtain node features of each node in the first graph structure through the first graph neural network, and then obtain and output the medical analysis result based on the node features of each node.
- the analysis result prediction model is obtained in the following manner:
- the training data set includes the omics data of each sample and the label corresponding to the omics data of each sample, and the label represents the real medical analysis result;
- the initial neural network model is iteratively trained based on different sub-data sets until the preset training end conditions are met;
- model parameters of the initial neural network model corresponding to the end of each training are fused, and the fused model parameters are used as the model parameters of the analysis result prediction model.
- a training data set and an initial neural network model may be obtained, wherein the training data set includes each sample omics data, each sample omics data
- the label corresponding to the data, the label represents the real medical analysis results.
- the training data set can be divided into different sub-data sets. For each sub-data set, the initial neural network model can be separately performed on the sub-data set.
- the initial neural network model corresponding to the sub-data set is obtained; correspondingly, when the initial neural network model corresponding to each sub-data set is obtained, the corresponding sub-data sets can be
- the model parameters of the initial neural network model of the set are fused, and the fused model parameters are used as the model parameters of the prediction model for the analysis result.
- multiple sample omics data can be acquired at one time, and then all sample omics data are randomly divided into 5 subsets as 5 groups of subset datasets, and one subset of datasets is taken as the The test set, the remaining 4 sub-data sets are used as training sets to train the initial neural network model, until the 5 sub-data sets are taken as the test set in turn, the initial neural network model corresponding to the 5 training can be obtained.
- the value of the network parameter in the 5 initial neural network models after training can be averaged, and the average processed network parameter value can be used as the network parameter value of the analysis result prediction model, That is, a five-fold cross-validation method is used to train the initial neural network model to obtain the analysis result prediction model.
- the training end condition can be the convergence of the value of the loss function corresponding to the initial neural network model, and the value of the loss function corresponding to the initial neural network model represents the medical analysis results predicted by the sample omics data and the real medical analysis of the sample omics data.
- the difference between the results, when the value of the loss function converges, indicates that the accuracy of the current initial neural network model has met the requirements, and the training can be ended at this time.
- the output results of the analysis result prediction model are different, and the corresponding loss functions when training the analysis result prediction model are also different.
- the category of medical analysis is disease diagnosis and disease classification
- the predicted medical analysis result output by the initial neural network model is the predicted probability of the sample omics data corresponding to each category. At this time, the probability of all sample omics data can be minimized.
- the cross-entropy between the predicted probability and the medical analysis result label is used to train the initial neural network model; when the category of medical analysis is survival prediction, the predicted medical analysis result output by the initial neural network model is the risk coefficient of the patient.
- the initial neural network model can be trained by the loss function of cox (Cox proportional hazards model).
- the omics data can be automatically analyzed based on the analysis result prediction model provided in the embodiments of the present application, so as to obtain early diagnosis and prediction results of diseases.
- the analysis result prediction model provided in the embodiment of the present application performs automatic analysis on the omics data
- the cascading regulatory network in biology is simulated by the graph structure processing of the omics data, so that the analysis result
- the prediction model has higher interpretability and higher clinical applicability; further, compared with the traditional analysis methods based on statistical testing methods and manual judgment, the final results can be obtained automatically, saving analysis
- the omics data and waiting time also effectively avoid manual judgment errors and effectively improve the accuracy of medical analysis results.
- the method further includes:
- the importance parameter value of each first omics feature is determined in the following ways:
- the importance parameter value of the node is obtained based on the importance parameter value of the node corresponding to the omics data of all samples, and the importance parameter value of the node is taken as the importance of the omics feature corresponding to the node parameter value.
- the importance parameter value of the first omics feature represents the importance of the first omics feature in the signal pathway constructed by the first omics feature.
- the importance parameter value of each first omics feature may also be obtained, and the importance of each first omics feature and the obtained medical analysis result may be provided to the user ( If provided to medical staff), at this time, medical staff can learn the omics features that play an important role in the results of medical analysis according to the importance parameter values of each first omics feature, and then propose a biological explanation, which will help patients Get more accurate medical treatment measures to achieve the purpose of precise treatment.
- the importance parameter value of the first omics feature is the importance of the node corresponding to the first omics feature in the first graph structure , and the importance of each node in the first graph structure can be determined based on the medical analysis results corresponding to the sample omics data, specifically:
- the medical analysis result corresponding to the sample omics data can be obtained based on the analysis result prediction model, and the process of obtaining the medical analysis result corresponding to the sample omics data , each feature of each node in the graph structure corresponding to the sample omics data (including the first feature and the second feature of at least one level), at this time, based on the medical analysis results, each feature of each node can Gradient calculation (such as derivation calculation), obtain each calculated value, and then sum the obtained calculated values to obtain the importance parameter value of each node in the graph structure of the sample omics data.
- Gradient calculation such as derivation calculation
- each sample group The number of nodes in the graph structure of the omics data is also the same, and the sample omics feature attributes represented by each node are also the same; correspondingly, for any node in the graph structure, it can be based on the graph structure of all sample omics data.
- the importance parameter value of the node is obtained, the importance parameter value of the node is obtained, and the importance parameter value of the node is used as the importance parameter value of the omics feature corresponding to the node.
- the importance parameter values of the node in the graph structure of all sample omics data can be summed, and the obtained sum value can be used as the importance parameter value of the node.
- important sample omics features can be determined based on the importance parameter value of each node, and then the determined important sample omics features can be enriched in signaling pathways (such as enrichment of signaling pathways through the Metascape platform), thereby Find omics signatures that can serve as biomarkers at the signaling pathway level.
- the importance parameter value of the omics feature of the sample can be obtained by gradient calculation, which can not only provide explanation and basis for the medical analysis result, but also can be based on the importance parameter of the omics feature of each sample.
- the value of the neural network model is tested and corrected.
- biomarkers that play an important role in disease prediction can also be obtained based on the important sample omics characteristics that can be determined, so as to determine more accurate prediction of diseases and determination of disease types.
- acquiring the first omics data to be processed includes:
- the initial omics data includes at least two initial omics characteristics
- the correlation omics features of the initial omics data, the correlation omics features and the initial omics data belong to the same target object, and the correlation omics features include at least one of case omics features or radiomic features;
- Each of the initial omics features and the associated omics features are fused respectively to obtain a fused omics feature corresponding to each initial omics feature, and use it as a first omics feature.
- the correlation omics feature refers to the feature that is associated with the initial omics data, the correlation omics feature and the initial omics data belong to the same target object, and the specific category of the correlation omics feature is not limited in the embodiment of the present application, such as
- the correlation-omics features may include at least one of case-omics features or radiomics features of the target subject.
- the first omics data to be processed when acquiring the first omics data to be processed, at least two kinds of initial omics features and associated omics features belonging to the same target object as the initial omics data may be obtained, and then each The initial omics features and the associated omics features are fused to obtain the fused omics features corresponding to each initial omics feature, and the fused omics features corresponding to each initial omics feature are included in the first omics data.
- the first omics characteristics when acquiring the first omics data to be processed.
- the omics data to be processed for determining the medical result is fused with the omics features of the target object and the associated features of the omics features, the feature expression of the omics data to be processed at this time is more abundant, so A more comprehensive and accurate medical analysis is achieved, and the accuracy of medical analysis results is improved.
- the omics data of N patients can be obtained, and the omics data of each patient includes K different omics features (that is, the K omics features in the figure).
- the omics data is used as training data (that is, the training data X NxK in the figure) to train the initial graph neural network to obtain an analysis result prediction model; further, the patient's omics data can be medically analyzed based on the analysis result prediction model. , to get the final medical analysis results.
- the omics data V ⁇ RK (ie, the first omics data to be processed, which includes K different omics features) corresponds to the omics data of one of the N patients.
- the method provided by the embodiment of the present application will be described in detail by taking the medical analysis result as an example, which may specifically include:
- the medical analysis result corresponding to the patient's omics data when determining the medical analysis result corresponding to the patient's omics data, it may include (a) gene co-expression analysis, (b) multiple hierarchical graph feature extraction and fusion, and (c) multi-task prediction3 Part, among which, multiple hierarchical graph feature extraction and fusion, and multi-task prediction can be implemented based on the analysis result prediction model obtained by training.
- the gene co-expression analysis part needs to be performed based on the patient's omics data, and then the obtained The results are input to the analysis result prediction model to obtain the final medical analysis result.
- WGCNA weighted gene co-expression analysis technology
- two feature extractions can be performed on the first feature G1 of each node based on two graph convolution (GAT) layers based on the attention mechanism (that is, the feature extraction layer in the preceding paragraph, and the GAT layer in the figure) to obtain Each node corresponds to the second features G2 and G3 of two levels; in which, when determining G2, the first GAT layer will weight and sum the first features of the connected nodes according to the attention value to obtain the first feature of each node.
- GAT graph convolution
- the three-level features G1, G2, and G3 can be mapped to features of the same dimension through their respective fully connected layers.
- the feature mapped by G1 is F1 ⁇ R K
- the feature mapped by G2 is F 2 ⁇
- the features mapped by R K and G3 are F 3 ⁇ R K
- the three-level features F 1 , F 2 and F 3 are fused by splicing to obtain the fused feature F ⁇ R 3K , which can then be based on F ⁇ R 3K for disease diagnosis, disease classification or survival prediction.
- further feature extraction ie, feature mapping
- feature R d1 indicates that the dimension of feature R is d1 dimension
- disease diagnosis, disease classification or survival prediction can be performed based on the feature R d1 (ie (c) multi-task prediction part in the figure).
- the feature R d1 when a disease diagnosis or disease classification is performed based on the feature R d1 (ie, the disease classification and classification in the figure), the feature R d1 can be mapped to a feature with the same dimension as the disease category or the number of disease categories ( In this example, c disease types or disease categories are taken as an example), and then the disease prediction result or disease classification prediction result R c (that is, the medical analysis result in the preceding paragraph) is obtained based on the mapped features.
- the disease prediction result or disease classification prediction result R c that is, the medical analysis result in the preceding paragraph
- the analysis result prediction model has The output y is R c (ie y ⁇ R c ), and R c represents the probability that the patient's omics data corresponds to each disease, or the probability corresponding to each category of disease; when the survival prediction is performed based on the feature R d1 , when determining the survival probability of the patient, the survival probability R 1 corresponding to the omics data of the patient can be obtained based on the feature R d1 (that is, the medical analysis result in the preceding paragraph).
- the output y of the analysis result prediction model is R 1 (ie y ⁇ R 1 ).
- the method provided by the embodiment of the present application simulates the biological cascade regulation network by structuring the omics data graph, and then uses the graph neural network to fully mine the relationship between the omics data.
- the influence of the association and interaction on the development of the disease the graph structure features of different levels can be fused, not only the information of the single omics feature level can be extracted, but also the comprehensive features of the signal pathway level can be extracted, so it can better represent the data.
- the prediction model can be automatically performed based on the analysis results, without manual intervention in the process, saving analysis data and waiting time Compared with traditional technical solutions, it has obvious advantages, and can be implemented more intelligently and accurately for omics data analysis, so as to provide more accurate medical interventions to meet the needs of medical staff. actual needs.
- the omics data processing apparatus 60 based on a graph neural network may include: a data acquisition module 601 , a correlation determination module 602, a graph structure building module 603, a node feature determination module 604, and an analysis result determination module 605, wherein,
- a data acquisition module 601 configured to acquire the first omics data of the target object, and extract at least two kinds of first omics features from the first omics data;
- a correlation determination module 602 configured to determine a first correlation between different omics features in the at least two first omics features
- the graph structure building module 603 is configured to construct a first graph structure corresponding to the first omics data based on the at least two first omics features and the first correlation, wherein the first graph structure includes at least two nodes, and Each node represents a first omics feature in the first omics data, and the first graph structure includes at least one edge connecting the at least two nodes, and the edge represents the two connected the first correlation corresponding to the node;
- the node feature determination module 604 is configured to obtain the node feature of each node in the first graph structure through the first graph neural network based on the first graph structure, and the node feature has at least one dimension;
- the analysis result determination module 605 is configured to perform medical analysis on the target object based on the node characteristics of each node, and obtain a medical analysis result corresponding to each dimension in the at least one dimension; the medical analysis includes performing a disease analysis on the target object. Diagnosis, disease classification, and survival prediction; the medical analysis results include the probability that the target object has a disease corresponding to each dimension, the probability that the disease of the target object corresponding to each dimension is a certain disease category, and the corresponding probability of each dimension. The survival probability of the target object.
- the graph structure building module when constructing a first graph structure corresponding to the first omics data based on the at least two first omics features and each first correlation, is specifically used for:
- any two first omics features in the at least two first omics features if the first correlation between the two first omics features is greater than or equal to the set value, then the two first omics features An edge is established between two nodes corresponding to the omics feature to construct the first graph structure.
- the apparatus further includes a feature extraction module for:
- each node in the first graph structure For each node in the first graph structure, extract a first feature of each first omics feature, where the first feature is each node in the first graph structure and includes only a single first group the characteristics of the academic characteristics themselves;
- the node feature determination module obtains the node features of each node in the first graph structure through the first graph neural network based on the first graph structure, it is specifically used for:
- the second graph of at least one level of the node is obtained by the first graph neural network based on the node in the first graph structure and each target node that has an edge relationship with the node.
- each of the levels corresponds to a feature extraction layer of the first graph neural network
- the first feature and each second feature corresponding to the node are fused to obtain the node feature of the node.
- the node feature determination module is performed by the first graph neural network based on the node in the first graph structure and each target node that has an edge relationship with the node , when the second feature of at least one level of the node is obtained, it is specifically used for:
- each associated feature For each node, based on each associated feature of the node, the weight of each associated feature is determined through the first graph neural network, wherein each associated feature includes the initial feature of the node and the initial feature of each target node corresponding to the node ;
- each node For each node, based on the weight of each associated feature of the node, weighted and fused each associated feature of the node through the first graph neural network to obtain a second feature of a level of the node;
- the second feature of any level except the second feature of the first level is obtained based on the second feature of the previous level of the level.
- the feature extraction module is also used to:
- the analysis result determination module obtains the medical analysis result based on the node characteristics of each node, it is specifically used for:
- the medical analysis result of the target object is determined.
- each second omics data includes at least two second omics features
- the node feature determination module obtains the node features of each node in the first graph structure through the first graph neural network based on the first graph structure, it is specifically used for:
- the feature extraction module extracts the data features corresponding to the second omics data, it is specifically used for:
- the node characteristics of each node corresponding to the second omics data are obtained, so as to obtain the second omics data
- the data feature corresponding to the data the data feature includes the node feature of each node corresponding to the second omics data.
- the node features of each node in the first graph structure are obtained, and the medical analysis result is obtained based on the node feature of each node, which is a prediction model based on the analysis result. obtained, wherein the analysis result prediction model is obtained by training the initial neural network model based on the omics data of each sample.
- the apparatus further includes an information providing module for:
- the importance parameter value of each first omics feature is determined in the following ways:
- the importance parameter value of the node is obtained, and the importance parameter value of the node is used as the importance parameter value of the omics feature corresponding to the node.
- the analysis result prediction model is obtained by:
- the training data set includes the omics data of each sample and the label corresponding to the omics data of each sample, and the label represents the real medical analysis result;
- the initial neural network model is iteratively trained based on different sub-data sets until the preset training end conditions are met;
- model parameters of the corresponding neural network models at the end of each training are fused, and the fused model parameters are used as the model parameters of the analysis result prediction model.
- the data acquisition module acquires the first omics data to be processed, it is specifically used for:
- the initial omics data includes at least two initial omics characteristics
- the correlation omics features of the initial omics data, the correlation omics features and the initial omics data belong to the same target object, and the correlation omics features include at least one of case omics features or radiomic features;
- Each initial omics feature and the associated omics feature are fused respectively to obtain a fused omics feature corresponding to each initial omics feature, and use it as the first omics feature.
- the medical analysis results include at least one of disease identification results, disease typing results, or survival prediction results.
- the graph neural network-based omics data processing apparatus of the embodiments of the present application can execute the graph neural network-based omics data processing method provided by the embodiments of the present application, and the implementation principles thereof are similar, which will not be repeated here.
- the omics data processing apparatus based on graph neural network can be a computer program (including program code) running in computer equipment, for example, the omics data processing apparatus based on graph neural network is an application software; the apparatus can be used to execute this Corresponding steps in the methods provided in the application examples.
- the graph neural network-based omics data processing apparatus may be implemented by a combination of software and hardware.
- the graph neural network-based omics data processing apparatus provided by the embodiments of the present application It may be a processor in the form of a hardware decoding processor, which is programmed to execute the graph neural network-based omics data processing method provided by the embodiment of the present invention.
- the processor in the form of a hardware decoding processor may use one or Multiple Application Specific Integrated Circuits (ASIC, Application Specific Integrated Circuit), DSP, Programmable Logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA) , Field-Programmable Gate Array) or other electronic components.
- ASIC Application Specific Integrated Circuit
- DSP Digital Signal Processing
- PLD Programmable Logic Device
- CPLD Complex Programmable Logic Device
- FPGA Field Programmable Gate Array
- FPGA Field-Programmable Gate Array
- the graph neural network-based omics data processing apparatus may be implemented in software.
- FIG. 3 shows the graph neural network-based omics data processing apparatus 60 stored in the memory.
- the graph neural network-based omics data processing apparatus 60 which can be software in the form of programs and plug-ins, and includes a series of modules, including a data acquisition module 601, a correlation determination module 602, a graph structure building module 603, a node feature determination module 604, and an analysis result determination module 605; wherein , a data acquisition module 601, a correlation determination module 602, a graph structure construction module 603, a node feature determination module 604, and an analysis result determination module 605 are used to implement the graph neural network-based omics data processing method provided by the embodiment of the present invention.
- the electronic device 2000 shown in FIG. 4 includes: a processor 2001 and a memory 2003 .
- the processor 2001 is connected to the memory 2003, for example, through the bus 2002.
- the electronic device 2000 may also include a transceiver 2004 . It should be noted that, in practical applications, the transceiver 2004 is not limited to one, and the structure of the electronic device 2000 does not constitute a limitation to the embodiments of the present application.
- the processor 2001 is used in the embodiments of the present application to implement the functions of the modules shown in FIG. 3 .
- the processor 2001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
- the processor 2001 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
- the bus 2002 may include a path to communicate information between the components described above.
- the bus 2002 may be a PCI bus, an EISA bus, or the like.
- the bus 2002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 4, but it does not mean that there is only one bus or one type of bus.
- the memory 2003 can be ROM or other types of static storage devices that can store static information and computer programs, RAM or other types of dynamic storage devices that can store information and computer programs, or EEPROM, CD-ROM or other optical disk storage, Optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or a desired computer program that can be used to carry or store or in the form of a data structure and can be accessed by Any other medium accessed by the computer, but not limited to this.
- the memory 2003 is used for storing a computer program for executing the application program of the solution of the present application, and the execution is controlled by the processor 2001 .
- the processor 2001 is configured to execute the computer program of the application program stored in the memory 2003, so as to realize the actions of the graph neural network-based omics data processing apparatus provided in the embodiment shown in FIG. 3 .
- An embodiment of the present application provides an electronic device, including a processor and a memory: the memory is configured to store a computer program, and when the computer program is executed by the processor, the processor causes the processor to perform any of the methods in the foregoing embodiments.
- Embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and when the computer program runs on a computer, the computer can execute any one of the methods in the foregoing embodiments.
- a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided in the various optional implementations described above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Public Health (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Physiology (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
一种基于图神经网络的组学数据处理方法、装置、设备及介质,涉及医疗、人工智能、云数据等技术领域。该方法包括:获取目标对象的第一组学数据,从第一组学数据提取至少两种第一组学特征(S101);确定至少两种第一组学特征中不同组学特征之间的第一相关性(S102);基于至少两种第一组学特征和第一相关性,构建第一组学数据对应的第一图结构,第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种第一组学特征,第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边所连接的两个节点对应的第一相关性(S103);基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,所述节点特征具有至少一个维度(S104);基于各节点的节点特征对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果;所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率(S105)。
Description
本申请要求于2020年11月30日提交中国专利局、申请号为202011379315.3、名称为“基于图神经网络的组学数据处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及医疗、人工智能、云数据等技术领域,具体而言,本申请涉及一种基于图神经网络的组学数据处理方法、装置、设备及介质。
背景
人体在生命周期的不同阶段以及疾病发展的不同阶段,其基因表达和蛋白表达可能存在巨大的差异,因此组学(基因组学,转录组学,蛋白组学和代谢组学等)是系统地研究生物学规律的重要工具,同时因为组学也可反映出机体所处的生命周期阶段以及疾病发展情况,因此,组学数据在医疗中起到至关重要的作用。
技术内容
本申请实施例提供了一种基于图神经网络的临床组学数据处理方法,该方法包括:
获取目标对象的第一组学数据;
从所述第一组学数据中提取至少两种第一组学特征;
确定至少两种第一组学特征中不同组学特征之间的第一相关性;
基于至少两种第一组学特征和第一相关性,构建第一组学数据对应的第一图结构,其中,第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种第一组学特征,第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边表征所连接的两个节点对应的第一相关性;
基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,所述节点特征具有至少一个维度;
基于各节点的节点特征对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果;所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率。
另一方面,本申请实施例提供了一种基于图神经网络的临床组学数据处理装置,该装置包括:
数据获取模块,用于获取目标对象的第一组学数据,从所述第一组学数据中提取至少两种第一组学特征;
相关性确定模块,用于确定至少两种第一组学特征中不同组学特征之间的第一相关性;
图结构构建模块,用于基于至少两种第一组学特征和第一相关性,构建第一组学数据对应的第一图结构,其中,第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种第一组学特征,第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边表征所连接的两个节点对应的第一相关性;
节点特征确定模块,用于基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点 的节点特征,所述节点特征具有至少一个维度;
分析结果确定模块,用于基于各节点的节点特征对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果;所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率。
再一方面,本申请实施例提供了一种电子设备,包括处理器以及存储器:存储器被配置用于存储计算机程序,计算机程序在由处理器执行时,使得处理器执行上述中基于图神经网络的组学数据处理的方法。
又一方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质用于存储计算机程序,当计算机程序在计算机上运行时,使得计算机可以执行上述中基于图神经网络的组学数据处理的方法。
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。
图1a为本申请实施例提供的一种基于图神经网络的组学数据处理方法的流程示意图;
图1b为本申请实施例的步骤S104中基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征的具体流程图;
图1c为本申请实施例的对于第一图结构中的每一节点,得到该节点的至少一个层级的第二特征的具体流程图;
图1d为本申请实施例的基于图神经网络的临床组学数据处理方法的流程图;
图2a为本申请实施例提供的一种基于图神经网络的组学数据处理方法的原理示意图;
图2b为本申请实施例提供的一种边矩阵示意图;
图3为本申请实施例提供的一种基于图神经网络的组学数据处理装置的结构示意图;
图4为本申请实施例提供的一种电子设备的结构示意图。
实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
随着组学数据在医疗中起到至关重要的作用,目前已有一些基于统计学方法和机器学习方法将组学数据用于疾病的诊断、分型和预测。但是统计学方法的思想集中在分析差异蛋白,需要大量人工干预,并且无法得到明确的分类或分型边界线。而基于机器学习的组学分析方法具体包括:首先获取样本组学特征,同时获取由医生标定的样本组学特征的样本类别标签,然后将所有的样本组学 特征按照一定比例划分为训练集、验证集和测试集;将训练集中的样本组学特征作为输入,对应的样本类别标签作为监督信号对模型进行训练,并根据验证集的表现筛选模型的最优参数,得到最终的模型,然后基于最终的模型对组学数据进行疾病预测。但是经发现,基于机器学习的组学数据分析方法存在以下缺点:
1、由于每个疾病发展都有其对应的级联调控网络,不同特征之间相互关联和调控,但是机器模型通常基于每个单独的特征或者一些特征的随机组合进行疾病预测,没有考虑到这些组学特征中天然存在的调控关系,因此无法解释真正的致病机理,模型的可解释性不强,预测精度也有限。
(2)单个组学特征或它们的随机组合通常受到不同实验批次和实验条件的影响,因此基于这些特征所得到的模型受数据批次的影响较大,泛化能力比较弱。
(3)由于生物学中的调控网络是级联放大机制,因此在不同类别的样本之间,高级的调控因子差异并不显著,而被调控的功能蛋白反而有着明显差异,但是机器模型通常会将差异明显的功能蛋白检测为生物标志物,反而忽略更具临床意义的调控因子。
(4)相关的机器学习主要集中在基因组学和转录组学,但是对蛋白组学的关注较少,但是蛋白组学的研究对于疾病的诊断、分型和预测有着不可替代的优势,忽略了蛋白组学的研究是实现精准医疗的一大障碍。
综上所述,目前对组学数据的研究没有充分利用各组学特征之间的级联调控关系,无法很好地揭示疾病发展的真实生物学过程,基于目前的方式所得到的预测结果的可解释性和预测精度上都有待提高。
基于此,本申请实施例提供一种基于图神经网络的组学数据处理方法、装置、设备及介质,旨在解决上述中所描述的部分或全部技术问题。在本申请实施例中,在获取到待处理组学数据后,可以基于人工智能技术对组学数据进行处理,得到对应的医学预测结果。具体的,本申请实施例中可以基于人工智能技术中的机器学习技术得到待处理组学数据中每种组学特征的特征,然后可以基于每种组学特征的特征得到最终的医学预测结果。
其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
而机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
在一些实施例中,本申请实施例中所涉及到的数据处理/计算可以基于云计算的方式进行。其中,云计算(cloud computing)指IT基础设施的交付和使用模式,指通过网络以按需、易扩展的方式获得所需资源;广义云计算指服务的交付和使用模式,指通过网络以按需、易扩展的方式获得所需服务。这种服务可以是IT和软件、互联网相关,也可是其他服务。云计算是网格计算(Grid Computing)、分布式计算(DistributedComputing)、并行计算(Parallel Computing)、效用计算(Utility Computing)、网络存储(Network Storage Technologies)、虚拟化(Virtualization)、负载均衡(Load Balance)等传统计算机和网络技术发展融合的产物。
随着互联网、实时数据流、连接设备多样化的发展,以及搜索服务、社会网络、移动商务和开放协作等需求的推动,云计算迅速发展起来。不同于以往的并行分布式计算,云计算的产生从理念上将推动整个互联网模式、企业管理模式发生革命性的变革。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。
首先对本申请涉及的几个名词进行介绍和解释:
组学(Omics):是用于系统地研究生物学规律的重要工具,主要包括基因组学(Genomics)、蛋白组学(Proteomics)、代谢组学(Metabolomics)、转录组学(transcriptomics),脂类组学(lipidomics)、免疫组学(Immunomics)、影像组学(Radiomics),超声组学(Ultrasomics)等。而组学特征:是可以反映生物学规律的各种组学的相关特征。
生物标志物(Biomarker):指可以标记系统、器官、组织、细胞及亚细胞结构或功能的改变或可能发生的改变的生化指标,具有非常广泛的用途,其可以用于疾病诊断、判断疾病分期或者用来评价新药或新疗法在目标人群中的安全性及有效性。
信号通路:是指当细胞里要发生某种反应时,信号从细胞外到细胞内传递了一种信息,细胞要根据这种信息来做出反应的现象,在本申请实施例中,当一个组学特征在执行功能时与其他组学特征相互作用时,该组学特征与其他组学特征将构成信号通路。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。
在一些实施例中,本申请实施所提供的方法由电子设备执行,该电子设备可以是服务器也可以是终端设备。具体的,本申请实施所提供的方法可以基于终端设备或服务器和终端设备进行数据交互来执行。其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器。终端设备可以是智能手机、平板电脑、笔记本电脑、台式计算机等,但并不局限于此。终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。本申请实施所提供的方法在基于服务器和终端设备进行数据交互来执行时,终端设备可以先将待处理的组学数据发送服务器,由服务器对接收到的组学数据进行医学分析,得到医学分析结果返回至终端设备,再由终端设备将医学分析结果提供给用户。
图1a示出了本申请实施例中所提供的一种基于图神经网络的组学数据处理方法的流程示意图,如图1a所示,该方法包括:
步骤S101,获取目标对象的第一组学数据,从所述第一组学数据提取至少两种第一组学特征。
其中,目标对象的第一组学数据指的是需要进行医学分析的组学数据,该第一组学数据包括了至少两种第一组学特征,所包括的各第一组学特征属于同一类别,例如都属于基因组学,且属于同一个目标对象,但是各自不同。在一些实施例中,目标对象的类别本申请实施例不限定,如目标对象可以为人类,也可以为动物等。在一示例中,假设某一第一组学数据对应于基因组学,并且包括了属于人物A的基因1~基因10,此时基因1~基因10为不同的基因。
步骤S102,确定至少两种第一组学特征中不同第一组学特征之间的第一相关性。
在实际应用中,不同的组学特征在执行功能的时候通常并不是独立完成的,而是需要结合其它组学特征共同完成的,即各组学特征之间是相互关联和存在调控关系的。基于此,在本申请实施例中可以确定不同第一组学特征之间的第一相关性,进而可以基于第一组学特征之间的第一相关性,将行使相似功能的第一组学特征关联起来。
其中,在确定不同第一组学特征之间的第一相关性时,可以通过加权基因共表达分析技术(Weighted Gene Co-Expression Network Analysis,WGCNA)计算不同组学特征之间的相关性矩阵,然后可以通过设置阈值的方式,将相关性矩阵进行二值化处理,并将二值化处理后的相关性矩阵称之为边矩阵。如当两种第一组学特征之间的相关性不小于阈值时,此时说明该两种第一组学特征执 行的功能相似,彼此相互作用(即可以构成信号通路),此时可以将相关性矩阵中表征该两种第一组学特征之间的相关性的元素的值设为1,而当两种第一组学特征之间的相关性小于阈值时,此时说明该两种第一组学特征彼此之间的相关性较低,则将相关性矩阵中表征该两种第一组学特征之间的相关性的元素的值设为0。
在本申请实施例中,通过WGCNA的方式计算不同第一组学特征之间的相关性,可以使行使相似功能的第一组学特征具有更高的相关性;进一步的,在得到不同的、第一组学特征之间的相关性矩阵后,可以将该相关性矩阵进行二值化处理,进而可以更好的突出不同第一组学特征之间的相关性。
步骤S103,基于至少两种第一组学特征和第一相关性,构建第一组学数据对应的第一图结构,其中,第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种第一组学特征,第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边表征所连接的两个节点对应的两种第一组学特征之间的第一相关性。
其中,图结构中包括各节点、以及连接各节点的连边,在本申请实施例中,图结构中的每个节点表征了一种第一组学特征,两个节点之间的连边表征了该两个节点对应的两种第一组学特征之间的第一相关性。相应的,在构建第一组学数据对应的第一图结构时,可以根据第一组学数据中所包括的第一组学特征得到第一图结构所包括的各节点,然后根据不同第一组学特征之间的第一相关性确定具体将第一图结构中哪两个节点之间建立连边,进而得到第一组学数据对应的第一图结构。
在本申请在一些实施例中实施例中,基于至少两种第一组学特征和各第一相关性,构建第一组学数据对应的第一图结构,包括:
对于任意两种第一组学特征,若两种第一组学特征之间的第一相关性大于或等于设定值,则在两种第一组学特征所对应的两个节点之间建立连边。
在一些实施例中,在已知不同第一组学特征之间的第一相关性后,对于第一组学数据中的任意两种第一组学特征,若确定该两种第一组学特征之间的第一相关性大于或等于设定值,则说明该两种第一组学特征执行的功能相似,彼此之间的相关性较高,此时可以将第一图结构中该两种第一组学特征所对应的两个节点之间建立连边。
在一些实施例中,若不同第一组学特征之间的第一相关性可以基于上述中的边矩阵体现,此时构建第一组学数据对应的第一图结构时,对于任意两种第一组学特征,若边矩阵中表征该两种第一组学特征之间的第一相关性的元素值为1,此时可以在该两种第一组学特征所对应的两个节点之间建立连边,而若边矩阵中表征该两种第一组学特征之间的第一相关性的元素值为0,则不对该两种第一组学特征所对应的两个节点之间建立连边。
在本申请实施例中,通过构建组学数据的图结构的方式,可以将行使相似功能的组学特征在图中被连接起来,此时不仅能够反映单个组学特征,并且还能反映不同组学特征之间的作用关系,能够更好地揭示致病机理,实现对生物学过程的模拟,从而可以得到更准确的疾病预测效果。
步骤S104,基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征所述节点特征具有至少一个维度。
在一些实施例中,当所述节点特征的维度为多个维度时,所述节点特征可以是一个序列或阵列。
其中,该第一图神经网络为该第一组学数据所属的组学对应的图神经网络,第一图神经网络的具体类型可以预先配置,如图神经网络可以是基于注意力机制的图卷积网络(Graph Attention Network,GAT),也可以其他图神经网络,如图卷积网络,图自编码器网络等,本申请实施例中对此不限定。
在一些实施例中,在得到第一组学数据对应的第一图结构时,可以通过该第一组学数据对应的图神经网络,得到每个第一组学特征的特征,即第一图结构中的各节点的节点特征。
在本申请在一些实施例中实施例中,该方法还包括:
提取各第一组学特征的第一特征;
如图1b示出了步骤S104中基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征的具体流程图。如图1b所示,步骤S104具体包括:
步骤S1041,对于第一图结构中的每一节点,由第一图神经网络基于第一图结构中的该节点、以及与该节点具有连边关系的各目标节点,得到该节点的至少一个层级的第二特征;
步骤S1042,对于每一节点,将该节点对应的第一特征和各第二特征融合,得到该节点的节点特征。
在一些实施例中,对于第一图结构中的每个节点,可以提取用于表征各节点的第一特征(即提取用于表征各第一组学特征自身的特征),对于第一图结构中的每一节点,可以确定与该节点具有连边关系的各目标节点,然后由第一图神经网络基于各目标节点的第一特征以及该节点的第一特征,对该节点的第一特征进行至少一次的特征提取,得到该节点的至少一个层级的第二特征,所述第一特征为在与其他特征进行融合之前的,所述第一图结构中的每个节点的、仅包括单个第一组学特征的特征。在一些实施例中,第一图神经网络中可以包括至少一层特征提取层(如GAT层),每个特征提取层的输出对应于一个第二特征,其中,第一个特征提取层的输入为第一图结构中各节点的第一特征,以及第一图结构中各节点之间的连边关系,除第一个特征提取层之外的其它特征提取层的输入为前一特征提取层对应的各节点的第二特征,以及各节点之间的连边关系。所述特征提取层可以是基于注意力机制的图卷积(GAT)层。
在一些实施例中,在得到各节点对应的第一特征和第二特征融合后,对于每个节点,可以将该节点对应的第一特征和至少一个第二特征进行融合,将融合后的特征作为该节点的节点特征,然后基于各节点的节点特征进行医学分析,得到对应的医学分析结果。
其中,在将每个节点的第一特征和至少一个第二特征进行融合时,可以将第一特征和至少一个第二特征经过各自连接的全连接层映射到相同的节点维度,得到各映射后的特征,然后通过拼接的方式将各映射后的特征进行融合,将融合后的特征作为每个节点的节点特征。
在本申请实施例中,由于每个节点的第二特征是基于组成信号通路的节点的特征融合得到的,此时得到的每个节点的第二特征融合了其它节点的组学特征(即第二特征为信号通路级别的特征);此时,将各节点的第一特征和第二特征融合后所得到的各节点的节点特征,即同时包含了单个组学特征级别的特征(即第一特征),还包括了信号通路级别的特征(即第二特征),可以更好地表征目标对象的第一组学数据对应的组学特征,从而在基于第一组学数据进行医学分析时,所得到的分析结果能够更加的准确。
在本申请实施例中,图1c示出了本申请实施例的对于第一图结构中的每一节点,由第一图神经网络基于第一图结构中的该节点、以及与该节点具有连边关系的各目标节点,得到该节点的至少一个层级的第二特征的具体流程图,如图1c所示,包括:
步骤S1043,获取第一图结构各节点的初始特征,其中,在确定各节点的第一个层级的第二特征时,各节点的所述初始特征为各节点对应的所述第一特征;在确定第一个层级的特征之外的任一层级的第二特征时,各节点的所述初始特征为该层级的前一层级的所述第二特征;
步骤S1044,对于每一节点,基于该节点的各关联特征,通过第一图神经网络确定各关联特征的权重,其中,各关联特征包括该节点的初始特征、以及与该节点所具有连边关系的各目标节点的初始特征;
步骤S1045,对于每一节点,基于该节点的各关联特征的权重,通过第一图神经网络对该节点的各关联特征进行加权融合,得到该节点的一个层级的第二特征;
其中,若一个节点对应有至少两个层级的第二特征,除第一个层级的第二特征之外的任一层级的第二特征是基于该层级的前一层级的第二特征得到的。
在一些实施例中,对于图结构中的各节点,可以确定各节点的初始特征,其中,若一个节点对应有至少两个层级的第二特征,此时除第一个层级的特征之外的任一层级的第二特征是基于该层级 的前一层级的第二特征得到的,即在确定各节点的第一个层级的第二特征时,各节点的初始特征为各节点对应的第一特征,而在确定第一个层级的特征之外的任一层级的第二特征时,各节点的初始特征为该层级的前一层级的第二特征。
在实际应用中,对于任一节点,若该节点与某节点存在连边,说明该节点所对应的第一组学特征与该某节点对应的第一组学特征执行相似功能,可以构成信号通路,但是每个节点在执行功能时的重要程度是不同的,此时可以通过权重表征每个节点在执行功能时的重要程度。
在一些实施例中,对于每一节点,可以确定与该节点存在连边的各目标节点,然后基于该节点的初始特征、以及该节点所对应的各目标节点的初始特征(即该节点的各关联特征),通过图卷积网络确定各关联特征的权重(即确定该节点的初始特征、以及该节点所对应的各目标节点的初始特征的权重),然后可以根据该节点的初始特征、以及该节点所对应的各目标节点的初始特征各自对应的权重对该节点的初始特征和各目标节点的初始特征进行加权,得到加权后的各初始特征,然后将加权后的各初始特征进行融合,并将融合后的特征融合作为该节点的一个层级的第二特征。
在一示例中,假设第一图神经网络包括两个GAT层,第一图结构中包括3个节点(节点1~节点3),且节点1分别与节点2和节点3存在连边,节点2和节点3仅与节点1存在连边。此时可以分别确定节点1~节点3的第一特征,对于节点1,可以确定该节点的目标节点为节点2和节点3,然后可以基于节点1的第一特征(即节点1的初始特征),以及节点2和节点3所对应的第一特征(即节点2和节点3的初始特征),通过图卷积网络中的第一个GAT层,确定节点1~节点3的第一特征的权重,然后通过第一个GAT层,根据节点1~节点3的第一特征各自对应的权重分别对节点1~节点3的第一特征进行加权并融合,得到节点1的第一个层级的第二特征,基于相同方式可以得到节点2和节点3的第一个层级的第二特征;进一步的,对于节点1,可以将节点1~节点3的第一个层级的第二特征作为节点1的关联特征,然后通过图卷积网络确定各关联特征的权重,然后基于该节点的各关联特征的权重,通过图卷积网络的第二个GAT层对该节点的节点1的关联特征进行加权融合,得到节点1的第二个层级的第二特征,同理可分别得到节点2和节点3的第二个层级的第二特征。
在本申请实施例中,由于每个节点的第二特征是将该节点的特征与其相连接的节点的特征加权融合得到的,因此能够将相似功能的组学特征(构成信号通路的组学特征)融合起来,此时得到的第二特征为信号通路级别的特征,能够实现对高级调控因子的更多关注。
步骤S105,基于各节点的节点特征,对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果,所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率。
在一些实施例中,在得到各节点的节点特征时,可以基于各节点的特征进行医学分析,得到对应的医学分析结果。其中,医学分析具体包括的类别可以预先配置,本申请实施例不限定,例如,可以基于各节点的节点特征进行疾病识别、疾病分型或生存预测中的至少一项;此时,所得到的医学分析结果则可以包括疾病识别结果、疾病分型结果或生存预测结果中的至少一项。
在本申请实施例中,对于待处理的组学数据,可以根据不同组学特征之间的相关性将组学数据中的各组学特征图结构化,进而可以有效的模拟生物学上各组学特征之间的相互关联和调控关系,能够更好地表示组学特征的状态;相应的,可以基于图结构,通过图神经网络,得到图结构中的各节点的节点特征,然后基于各节点的节点特征,得到对应的医学分析结果,由于图结构化的组学特征能够有效的模拟生物学上组学特征之间的相互关联和调控关系,此时基于该图结构,通过图神经网络所得到的每个节点的节点特征融合了其它节点的特征,属于信号通路层次的综合特征、且能够体现各组学特征之间的相互关联和调控关系,所表征的内容更加丰富,进而基于各节点的节点特征所得到医学分析结果将会更加准确。
在本申请在一些实施例中实施例中,图1d示出了本申请实施例的基于图神经网络的临床组学 数据处理方法的流程图。如图1d所示,该方法还包括:
步骤S106,获取至少一个第二组学数据,第一组学数据和至少一个第二组学数据中的不同组学数据均属于不同组学、且至少一个第二组学数据和第一组学数据属于同一目标对象;
步骤S107,提取各第二组学数据所对应的数据特征;
基于各节点的节点特征,得到医学分析结果,包括:
步骤S1051,基于各节点的节点特征和各第二组学数据所对应的数据特征,确定目标对象的医学分析结果。
其中,每个第二组学数据和第一组学数据属于同一目标对象,且每个第二组学数据与第一组学数据均属于不同类别的组学,例如,第一组学数据为基因组学,第二组学数据为蛋白组学,该第二组学数据和第一组学数据均属于人物A。
其中,在基于各第二组学数据所对应的数据特征和各节点的节点特征共同确定该目标对象的医学分析结果时,第一组学数据和各第二组学数据在确定医学分析结果时的重要程度可能是不同的,此时可以通过设置不同的权重来表征第一组学数据和各第二组学数据在确定医学分析结果时的重要程度;相应的,在确定医学分析结果时,则可以基于第一组学数据和每个第二组学数据各自对应的权重分别对各第二组学数据所对应的数据特征和各节点的节点特征进行加权融合,得到融合后的特征,然后基于该融合后的特征,确定目标对象的医学分析结果。
在本申请实施例中,由于在确定目标对象对应的医学分析结果时,还融合了属于同一目标对像、但是与第一组学数据不同类别的其它组学数据,因此实现了更加全面精准的医学分析,提升了医学分析结果的准确性。
在本申请在一些实施例中实施例中,每个第二组学数据包括至少两种第二组学特征;
基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,包括:
基于第一图结构,通过第一组学数据所属的组学所对应的第一图神经网络,得到第一图结构的各节点的节点特征;
对于任一第二组学数据,提取第二组学数据所对应的数据特征,包括:
确定该第二组学数据的至少两种第二组学特征中不同第二组学特征之间的第二相关性;
基于至少两种第二组学特征和各第二相关性,构建该第二组学数据对应的第二图结构;
基于第二图结构,通过与该第二组学数据所属的组学对应的第二图神经网络,得到该第二组学数据所对应的各节点的节点特征,以得到所述第二组学数据所对应的所述数据特征,数据特征包括该第二组学数据所对应的各节点的节点特征。
其中,第二组学数据包括至少两种第二组学特征,该第二组学特征与第一组学特征属于不同类别的组学。在一些实施例中,可以预先配置每一种组学所对应于的图神经网络,如可以预先配置基因组学对应的图神经网络、蛋白质组学对应的图神经网络等,由于每种组学所对应于的图神经网络是基于不同类别的样本组学特征所训练得到,此时每种组学所对应于的图神经网络的网络参数是各不相同的。
相应的,在得到第一组学数据和对应的第一图结构时,可以基于第一组学数据所属的组学所对应的第一图神经网络,得到第一图结构的各节点的节点特征;对于任一第二组学数据,可以确定该第二组学数据所包括的至少两种第二组学特征中不同第二组学特征之间的第二相关性,然后根据基于至少两种第二组学特征和不同第二组学特征之间的第二相关性,构建第二组学数据对应的第二图结构,此时该第二图结构中的一个节点表征一种第二组学特征,该第二图结构中的连边表征了该连边的两个节点对应的两种第二组学特征之间的第二相关性;进一步的,可以基于第二图结构,通过第二组学数据所属的组学对应的第二图神经网络,得到第二图结构中的各节点的节点特征,即第二组学数据所对应的数据特征。
在本申请在一些实施例中实施例中,基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,以及基于各节点的节点特征得到医学分析结果,是通过分析结果预测模型 得到的,其中,分析结果预测模型是基于各样本组学数据对初始神经网络模型进行训练得到的。
在一些实施例中,可以获取各样本组学数据和初始神经网络模型,然后基于获取的各样本组学数据对初始神经网络模型进行训练,得到分析结果预测模型,此时在确定待处理的第一组学数据对应的医学分析结果时,可以先确定第一样组学数据对应的第一图结构,然后可以将第一组学数据对应的第一图结构输入至该分析结果预测模型,该分析结果预测模型可以基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,然后基于各节点的节点特征得到医学分析结果并输出。
在本申请在一些实施例中实施例中,分析结果预测模型是通过下列方式得到的:
获取训练数据集和初始神经网络模型,训练数据集包括各样本组学数据、每个样本组学数据对应的标注标签,标注标签表征了真实医学分析结果;
将训练数据集划分为不同的子数据集;
基于不同的子数据集对初始神经网络模型分别进行迭代训练,直至满足预设的训练结束条件;
将每次训练结束时所对应的初始神经网络模型的模型参数进行融合,将融合后的模型参数作为分析结果预测模型的模型参数。
在一些实施例中,在基于各样本组学数据对初始神经网络模型进行训练时,可以获取训练数据集和初始神经网络模型,其中,训练数据集包括各样本组学数据、每个样本组学数据对应的标注标签,标注标签表征了真实医学分析结果,进一步的,可以将训练数据集划分为不同的子数据集,对于每个子数据集,可以将该子数据集对初始神经网络模型分别进行迭代训练,直至满足预设的训练结束条件,得到对应于该子数据集的初始神经网络模型;相应的,在得到对应于各子数据集的初始神经网络模型时,可以将对应于各子数据集的初始神经网络模型的模型参数进行融合,将融合后的模型参数作为该分析结果预测模型的模型参数。
在一些实施例中,在实际应用中,可以一次性获取多个样本组学数据,然后将所有样本组学数据随机分成5个子集作为5组子数据集,每次取其中一个子数据集作为测试集,其余4个子数据集作为训练集对初始神经网络模型进行训练,直到5个子数据集依次被取作测试集后,可以得到对应于5个训练后的初始神经网络模型,此时对于分析结果预测模型中的每个网络参数,可以将5个训练后的初始神经网络模型中的该网络参数值进行平均处理,并将平均处理后的网络参数值作为分析结果预测模型的网络参数值,即采用五折交叉验证的方式对初始神经网络模型进行训练来得到分析结果预测模型。
其中,训练结束条件可以为初始神经网络模型对应的损失函数的值收敛,而初始神经网络模型对应的损失函数的值表征了样本组学数据预测的医学分析结果与样本组学数据的真实医学分析结果之间的差异,当损失函数的值收敛时,则说明当前的初始神经网络模型的精度已满足要求,此时可以结束训练。
在一些实施例中,当需要进行不同类型的医学分析时,分析结果预测模型的输出结果是不同的,在训练该分析结果预测模型时所对应的损失函数也是不同的。例如,若医学分析的类别为疾病诊断和疾病分型,初始神经网络模型输出的预测医学分析结果为样本组学数据对应于每个类别的预测概率,此时可以最小化所有样本组学数据的预测概率和医学分析结果标签之间的交叉熵来对初始神经网络模型进行训练;而当医学分析的类别为生存预测时,初始神经网络模型输出的预测医学分析结果为病人的危险系数,此时可以通过cox(Cox proportional hazards model,风险比例回归模型)的损失函数对初始神经网络模型进行训练。
在实际应用中,可以基于本申请实施例所提供的分析结果预测模型对组学数据进行自动化分析,从而得到疾病早期诊断和预测结果。此外,由于本申请实施例所提供的分析结果预测模型在对组学数据进行自动化分析时,通过对组学数据的图结构化处理来模拟生物学中的级联调控网络,从而使该分析结果预测模型具有更高的可解释性,更高的临床可应用性;进一步的,与传统的基于统计学检验方法和人工判断为主的分析方式相比,可以自动化的得到最终结果,节省了分析组学数据和等 待的时间,也有效的避免了人工判断误差,有效提高医学分析结果的准确度。
在本申请在一些实施例中实施例中,该方法还包括:
获确定各第一组学特征对应的重要性参数值;
将医学分析结果和各第一组学特征对应的重要性参数值提供给用户;
其中,各第一组学特征的重要性参数值是通过以下方式确定的:
对于每一样本组学数据,基于该样本组学数据的医学分析结果,确定该样本组学数据对应的图结构中各组学特征所对应的节点的重要性参数值;
对于任一节点,基于所有样本组学数据所对应的该节点的重要性参数值,得到该节点的重要性参数值,将该节点的重要性参数值作为该节点对应的组学特征的重要性参数值。
其中,第一组学特征的重要性参数值表征了第一组学特征在其构建的信号通路中的重要程度。在一些实施例中,在本申请实施例中还可以获取各第一组学特征的重要性参数值,并将各第一组学特征的重要性以及得到的医学分析结果一并提供给用户(如提供给医护人员),此时医护人员可根据各第一组学特征的重要性参数值,获知对医学分析结果起着重要作用的组学特征,进而提出生物学解释,将有利于使病人得到更加准确的医学治疗措施,达到精准治疗的目的。
在一些实施例中,对于每个第一组学特征的重要性参数值,该第一组学特征的重要性参数值为该第一组学特征在第一图结构中对应的节点的重要性,而第一图结构中每个节点的重要性可以基于样本组学数据对应的医学分析结果来确定,具体的:
在训练得到分析结果预测模型后,对于每一样本组学数据,可以基于分析结果预测模型得到该样本组学数据对应的医学分析结果,以及在得到该样本组学数据对应的医学分析结果的过程中,该样本组学数据对应的图结构中每个节点的各特征(包括第一特征和至少一个层级的第二特征),此时可以基于医学分析结果,分别对每个节点的各特征进行梯度计算(如进行求导计算),得到各计算值,然后将得到的各计算值求和,得到该样本组学数据的图结构中各每个节点的重要性参数值,基于同样方式,得到所有样本组学数据的图结构中各每个节点的重要性参数值;可以理解的是,由于各样本组学数据中的样本组学特征的数量以及类别均是相同的,此时各样本组学数据的图结构中节点数量也是相同的,每个节点所表征的样本组学特征属性也是相同的;相应的,对于图结构中任一节点,可以基于所有样本组学数据的图结构中该节点的重要性参数值,得到该节点的重要性参数值,并将该节点的重要性参数值作为该节点对应的组学特征的重要性参数值。例如,可以将所有样本组学数据的图结构中该节点的重要性参数值求和,将得到的和值作为该节点的重要性参数值。
进一步的,可以基于每个节点的重要性参数值确定出重要样本组学特征,然后将确定的重要样本组学特征进行信号通路的富集(如通过Metascape平台进行信号通路的富集),从而找到可以作为信号通路级别的生物标志物的组学特征。
在本申请实施例中,可以通过梯度计算的方式得到的样本组学特征的重要性参数值,此时不仅可以为医学分析结果提供解释和依据,还可以基于各样本组学特征的重要性参数值对神经网络模型进行检验和修正。同时,还可以基于还可以确定出的重要样本组学特征得到对疾病预测起着重要作用的生物标志物,进而确定更加精准的预测疾病、确定疾病类型。
在本申请在一些实施例中实施例中,获取待处理的第一组学数据,包括:
获取初始组学数据,初始组学数据包括至少两种初始组学特征;
获取初始组学数据的关联组学特征,关联组学特征和初始组学数据属于同一目标对象,关联组学特征包括病例组学特征或影像组学特征中的至少一项;
分别将每种所述初始组学特征和所述关联组学特征进行融合,得到每种初始组学特征对应的融合组学特征,,并将其作为一种第一组学特征。
其中,关联组学特征指的是与初始组学数据存在关联的特征,关联组学特征与初始组学数据属于同一目标对象,而该关联组学特征的具体类别本申请实施例不限定,如关联组学特征可以包括目标对象的病例组学特征或影像组学特征中的至少一项。
在一些实施例中,在获取待处理的第一组学数据时,可以获取至少两种初始组学特征、以及与该初始组学数据属于同一目标对象的关联组学特征,然后分别将每种初始组学特征和该关联组学特征进行融合,得到每种初始组学特征对应的融合组学特征,并将每种初始组学特征对应的融合组学特征作为第一组学数据所包括的第一组学特征。
在本申请实施例中,由于用于确定医学结果的待处理组学数据融合了目标对象的组学特征和组学特征的关联特征,此时的待处理组学数据的特征表达更加丰富,从而实现了更加全面精准的医学分析,提升了医学分析结果的准确性。
为了更好地理解本申请实施例所提供的方法,下面结合图2a对该方法进行详细描述。在本示例中,可以获取到N个病人的组学数据,每个病人的组学数据包括K种不同的组学特征(即图中的K组学特征),此时可以将N个病人的组学数据作为训练数据(即图中的训练数据X
NxK)对初始图神经网络进行训练,得到分析结果预测模型;进一步的,可以基于该分析结果预测模型对病人的组学数据进行医学分析的,得到最终的医学分析结果。在一些实施例中,在本示例中以确定N个病人中的一个病人的组学数据V∈R
K(即待处理的第一组学数据,其包括K种不同的组学特征)所对应的医学分析结果为例对本申请实施例所提供的方法进行详细说明,具体可以包括:
在一些实施例中,在确定病人的组学数据所对应的医学分析结果时,可以包括(a)基因共表达分析、(b)多次层次图特征提取和融合和(c)多任务预测3部分,其中,多次层次图特征提取和融合、以及多任务预测可以基于训练得到的分析结果预测模型来实现,此时需要先基于病人的组学数据执行基因共表达分析部分,然后再将得到的结果输入至分析结果预测模型,得到最终的医学分析结果。具体的,在获取到病人的组学数据后,可以基于加权基因共表达分析技术(即WGCNA)计算不同的组学特征之间的相关性矩阵,然后可以通过设置阈值,将相关性矩阵中的元素的值二值化处理,得到维度为K维乘K维的边矩阵E
KxK,该边矩阵E
KxK内包括各元素a
ij(i=1、2、……、K,j=1、2、……、K),具体如图2b所示。例如,对于相关性矩阵中的任一个元素a
12,若该元素a
12所表征的两种组学特征之间的相关程度大于阈值,则将该元素a
12的值设置为1,反之,则设置为0。
进一步的,可以将每种组学特征作为一个节点,根据边矩阵确定各节点之间的连接情况,得到组学数据对应的图结构,例如,对于任意两种组学特征,若两种组学特征之间的相关性大于或等于设定值,则在该两种组学特征所对应的两个节点之间建立连边;然后可以基于确定的图结构进行特征提取(如通过全连接层进行特征提取,图中未示出),得到各节点的第一特征G1(图中以G1=G1(V
Kx1,E
KxK)表征得到各节点的第一特征的过程);
进一步的,可以基于两个基于注意力机制的图卷积(GAT)层(即前文中的特征提取层,图中的GAT层)对各节点的第一特征G1进行两次的特征提取,得到各节点对应于两个层级的第二特征G2和G3;其中,在确定G2时,第一个GAT层会将相连接的节点的第一特征根据注意力值加权求和,得到各节点的第二特征G2(图中以G2=G2(V
Kxh2,E
KxK)表征得到各节点的第二特征G2的过程,h2表示进行第二次特征提取),然后第二个GAT层将相连接的节点的第二特征G2根据注意力值加权求和,得到各节点的第二特征G3(图中以G3=G3(V
K xh3,E
KxK)表征得到各节点的第二特征G3的过程,h3表示进行第二次特征提取);至此,每种组学特征将得到三个不同级别的特征,分别是局部特征G1(每个节点的特征仅包含单个组学特征)和整体特征G2,G3(每个节点的特征都融合了信号通路上相连的组学特征的特征)。
进一步的,可以将三个级别的特征G1、G2和G3经过各自连接的全连接层映射为相同维度的特征,如G1映射后的特征为F1∈R
K、G2映射后的特征为F
2∈R
K、G3映射后的特征为F
3∈R
K,然后通过拼接的方式将三个级别的特征F
1、F
2和F
3进行融合,得到融合后的特征F∈R
3K,然后可以基于F∈R
3K进行疾病诊断、疾病分型或生存预测。
其中,在基于F∈R
3K进行疾病诊断、疾病分型或生存预测前可以通过全连接网络进行进一步的特征提取(即特征映射),得到特征R
d1(d1表示特征R的维度为d1维);然后可以基于特征R
d1 进行疾病诊断、疾病分型或生存预测(即图中的(c)多任务预测部分)。
在一些实施例中,当基于特征R
d1进行疾病诊断或疾病分型(即图中的疾病分类与分型)时,可以将特征R
d1映射为维度与疾病种类或疾病类别数量相同的特征(本示例中以c个疾病种类或疾病类别为例),然后基于映射后的特征得到疾病预测结果或疾病分型预测结果R
c(即前文中的医学分析结果),此时分析结果预测模型的输出y为R
c(即y∈R
c),而R
c表征了病人的组学数据对应于每一种疾病的概率、或对应于每一类别疾病的概率;当基于特征R
d1进行生存预测,确定病人的生存概率时,可以基于特征R
d1,得到病人的组学数据所对应的生存概率R
1(即前文中的医学分析结果),此时分析结果预测模型的输出y即为R
1(即y∈R
1)。
基于上述中实施例的说明可见,本申请实施例所提供的方法通过将组学数据图结构化的方式来模拟生物学上的级联调控网络,然后利用图神经网络充分挖掘组学数据之间的关联和相互作用对疾病发展的影响,可以将不同层级的图结构特征融合,不仅能够提取单个组学特征层次的信息,还能够提取信号通路层次的综合特征,因此能够更好地表示数据的状态,从而得到更加准确的预测结果,并且在确定病人的组学数据所对应的医学分析结果时,可以基于分析结果预测模型自动进行,在此过程中无需人工干涉,节省了分析数据和等待时间,避免人为判断的误差所带来的问题,和传统的技术方案相比,具有明显优势,实施起来可以更智能更准确地进行组学数据分析,从而能够更加精确的提供医疗干预,满足医护人员的实际需求。
本申请实施例提供了一种基于图神经网络的组学数据处理装置60,如图3所示,该基于图神经网络的组学数据处理装置60可以包括:数据获取模块601、相关性确定模块602、图结构构建模块603、节点特征确定模块604以及分析结果确定模块605,其中,
数据获取模块601,用于获取目标对象的第一组学数据,从所述第一组学数据中提取至少两种第一组学特征;
相关性确定模块602,用于确定至少两种第一组学特征中不同组学特征之间的第一相关性;
图结构构建模块603,用于基于至少两种第一组学特征和第一相关性,构建第一组学数据对应的第一图结构,其中,第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种第一组学特征,第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边表征所连接的两个节点对应的第一相关性;
节点特征确定模块604,用于基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,所述节点特征具有至少一个维度;
分析结果确定模块605,用于基于各节点的节点特征对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果;所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率。
在一些实施例中,图结构构建模块在基于至少两种第一组学特征和各第一相关性,构建第一组学数据对应的第一图结构时,具体用于:
对于所述至少两种第一组学特征中的任意两种第一组学特征,若两种第一组学特征之间的第一相关性大于或等于设定值,则在两种第一组学特征所对应的两个节点之间建立连边,以构建所述第一图结构。
在一些实施例中,该装置还包括特征提取模块,用于:
对于所述第一图结构中的每个节点,提取各第一组学特征的第一特征,所述第一特征为所述第一图结构中的每个节点的、仅包括单个第一组学特征自身的特征;
节点特征确定模块在基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征时,具体用于:
对于第一图结构中的每一节点,由第一图神经网络基于第一图结构中的该节点、以及与该节点 具有连边关系的各目标节点,得到该节点的至少一个层级的第二特征,每个所述层级对应所述第一图神经网络的一个特征提取层;
对于每一节点,将该节点对应的第一特征和各第二特征融合,得到该节点的节点特征。
在一些实施例中,对于第一图结构中的每一节点,节点特征确定模块在由第一图神经网络基于第一图结构中的该节点、以及与该节点具有连边关系的各目标节点,得到该节点的至少一个层级的第二特征时,具体用于:
获取第一图结构各节点的初始特征;
对于每一节点,基于该节点的各关联特征,通过第一图神经网络确定各关联特征的权重,其中,各关联特征包括该节点的初始特征、以及该节点所对应的各目标节点的初始特征;
对于每一节点,基于该节点的各关联特征的权重,通过第一图神经网络对该节点的各关联特征进行加权融合,得到该节点的一个层级的第二特征;
其中,若一个节点对应有至少两个层级的第二特征,除第一个层级的第二特征之外的任一层级的第二特征是基于该层级的前一层级的第二特征得到的。
在一些实施例中,特征提取模块,还用于:
获取至少一个第二组学数据,第一组学数据和至少一个第二组学数据中的不同组学数据均属于不同组学、且至少一个第二组学数据和第一组学数据属于同一目标对象;
提取各第二组学数据所对应的数据特征;
分析结果确定模块在基于各节点的节点特征,得到医学分析结果时,具体用于:
基于各节点的节点特征和各第二组学数据所对应的数据特征,确定目标对象的医学分析结果。
在一些实施例中,每个第二组学数据包括至少两种第二组学特征;
节点特征确定模块在基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征时,具体用于:
对于任一第二组学数据,特征提取模块在提取第二组学数据所对应的数据特征时,具体用于:
确定该第二组学数据的至少两种第二组学特征中不同第二组学特征之间的第二相关性;
基于至少两种第二组学特征和各第二相关性,构建该第二组学数据对应的第二图结构;
基于第二图结构,通过与该第二组学数据所属的组学对应的第二图神经网络,得到该第二组学数据所对应的各节点的节点特征,以得到所述第二组学数据所对应的所述数据特征,数据特征包括该第二组学数据所对应的各节点的节点特征。
在一些实施例中,基于第一图结构,通过第一图神经网络,得到第一图结构中的各节点的节点特征,以及基于各节点的节点特征得到医学分析结果,是通过分析结果预测模型得到的,其中,分析结果预测模型是基于各样本组学数据对初始神经网络模型进行训练得到的。
在一些实施例中,该装置还包括信息提供模块,用于:
获取各第一组学特征对应的重要性参数值;
将医学分析结果和各第一组学特征对应的重要性参数值提供给用户;
其中,各第一组学特征的重要性参数值是通过以下方式确定的:
对于每一样本组学数据,基于该样本组学数据的医学分析结果,确定该样本组学数据对应的图结构中各组学特征所对应的节点的重要性;
对于任一节点,基于所有样本组学数据所对应的该节点的重要性,得到该节点的重要性参数值,将该节点的重要性参数值作为该节点对应的组学特征的重要性参数值。
在一些实施例中,分析结果预测模型是通过下列方式得到的:
获取训练数据集和初始神经网络模型,训练数据集包括各样本组学数据、每个样本组学数据对应的标注标签,标注标签表征了真实医学分析结果;
将训练数据集划分为不同的子数据集;
基于不同的子数据集对初始神经网络模型分别进行迭代训练,直至满足预设的训练结束条件;
将每次训练结束时所对应的神经网络模型的模型参数进行融合,将融合后的模型参数作为分析结果预测模型的模型参数。
在一些实施例中,数据获取模块在获取待处理的第一组学数据时,具体用于:
获取初始组学数据,初始组学数据包括至少两种初始组学特征;
获取初始组学数据的关联组学特征,关联组学特征和初始组学数据属于同一目标对象,关联组学特征包括病例组学特征或影像组学特征中的至少一项;
分别将每种初始组学特征和关联组学特征进行融合,得到每种初始组学特征对应的融合组学特征,并将其作为所述第一组学特征。
在一些实施例中,医学分析结果包括疾病识别结果、疾病分型结果或生存预测结果中的至少一项。
本申请实施例的基于图神经网络的组学数据处理装置可执行本申请实施例提供的一种基于图神经网络的组学数据处理方法,其实现原理相类似,此处不再赘述。
基于图神经网络的组学数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如基于图神经网络的组学数据处理装置为一个应用软件;该装置可以用于执行本申请实施例提供的方法中的相应步骤。
在一些实施例中,本发明实施例提供的基于图神经网络的组学数据处理装置可以采用软硬件结合的方式实现,作为示例,本申请实施例提供的基于图神经网络的组学数据处理装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本发明实施例提供的基于图神经网络的组学数据处理方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
在另一些实施例中,本发明实施例提供的基于图神经网络的组学数据处理装置可以采用软件方式实现,图3示出了存储在存储器中的基于图神经网络的组学数据处理装置60,其可以是程序和插件等形式的软件,并包括一系列的模块,包括数据获取模块601、相关性确定模块602、图结构构建模块603、节点特征确定模块604以及分析结果确定模块605;其中,数据获取模块601、相关性确定模块602、图结构构建模块603、节点特征确定模块604以及分析结果确定模块605用于实现本发明实施例提供的基于图神经网络的组学数据处理方法。
本申请实施例提供了一种电子设备,如图4所示,图4所示的电子设备2000包括:处理器2001和存储器2003。其中,处理器2001和存储器2003相连,如通过总线2002相连。可选地,电子设备2000还可以包括收发器2004。需要说明的是,实际应用中收发器2004不限于一个,该电子设备2000的结构并不构成对本申请实施例的限定。
其中,处理器2001应用于本申请实施例中,用于实现图3所示的各模块的功能。
处理器2001可以是CPU,通用处理器,DSP,ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器2001也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
总线2002可包括一通路,在上述组件之间传送信息。总线2002可以是PCI总线或EISA总线等。总线2002可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器2003可以是ROM或可存储静态信息和计算机程序的其他类型的静态存储设备,RAM或者可存储信息和计算机程序的其他类型的动态存储设备,也可以是EEPROM、CD-ROM或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储或以数据结构形式的期望的计算机程序并能够由 计算机存取的任何其他介质,但不限于此。
存储器2003用于存储执行本申请方案的应用程序的计算机程序,并由处理器2001来控制执行。处理器2001用于执行存储器2003中存储的应用程序的计算机程序,以实现图3所示实施例提供的基于图神经网络的组学数据处理装置的动作。
本申请实施例提供了一种电子设备,包括处理器以及存储器:存储器被配置用于存储计算机程序,计算机程序在由处理器执行时,使得处理器上述实施例中的任一项方法。
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质用于存储计算机程序,当计算机程序在计算机上运行时,使得计算机可以执行上述实施例中的任一项方法。
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各种可选实现方式中提供的方法。
本申请中的一种计算机可读存储介质所涉及的名词及实现原理具体可以参照本申请实施例中的一种基于图神经网络的组学数据处理方法,在此不再赘述。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
以上仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。
Claims (15)
- 一种基于图神经网络的临床组学数据处理方法,由电子设备执行,包括:获取目标对象的第一组学数据;从所述第一组学数据中提取至少两种第一组学特征;确定所述至少两种第一组学特征中不同组学特征之间的第一相关性;基于所述至少两种第一组学特征和所述第一相关性,构建所述第一组学数据对应的第一图结构,其中,所述第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种所述第一组学特征,所述第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边表征所连接的两个节点对应的第一相关性;基于所述第一图结构,通过第一图神经网络,得到所述第一图结构中的各节点的节点特征,所述节点特征具有至少一个维度;基于所述各节点的节点特征对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果;所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率。
- 根据权利要求1所述的方法,其中,所述基于所述至少两种第一组学特征和所述第一相关性,构建所述第一组学数据对应的第一图结构,包括:对于所述至少两种第一组学特征中的任意两种所述第一组学特征,若所述两种第一组学特征之间的第一相关性大于或等于设定值,则在所述两种第一组学特征所对应的两个节点之间建立连边,以构建所述第一图结构。
- 根据权利要求1所述的方法,还包括:对于所述第一图结构中的每个节点,提取所述第一组学特征的第一特征,所述第一特征为所述第一图结构中的每个节点的、仅包括单个第一组学特征的特征;所述基于所述第一图结构,通过第一图神经网络,得到所述第一图结构中的各节点的节点特征,包括:对于所述第一图结构中的每一节点,由所述第一图神经网络基于所述第一图结构中的该节点、以及与该节点具有连边关系的各目标节点,得到该节点的至少一个层级的第二特征,每个所述层级对应所述第一图神经网络的一个特征提取层;对于每一节点,将该节点对应的第一特征和各所述第二特征融合,得到该节点的节点特征。
- 根据权利要求3所述的方法,其中,所述对于所述第一图结构中的每一节点,由所述第一图神经网络基于所述第一图结构中的该节点、以及与该节点具有连边关系的各目标节点,得到该节点的至少一个层级的第二特征,包括:获取所述第一图结构各节点的初始特征;对于每一节点,基于该节点的各关联特征,通过所述第一图神经网络确定各所述关联特征的权重,其中,所述节点的各所述关联特征包括该节点的初始特征、以及与该节点具有连边关系的各目标节点的初始特征;对于每一节点,基于该节点的各所述关联特征的权重,通过所述第一图神经网络对该节点的各关联特征进行加权融合,得到该节点的一个层级的第二特征;其中,若一个节点对应有至少两个层级的第二特征,则除所述第一个层级之外的任一层级的第二特征是基于该层级的前一层级的第二特征得到的。
- 根据权利要求4所述的方法,其中,所述获取所述第一图结构各节点的初始特征,包括:如果在确定各节点的第一个层级的第二特征时,则将各节点对应的所述第一特征作为各节点的所述初始特征;如果在确定第一个层级之外的任一层级的第二特征时,则将该层级的前一层级的所述第二特征作为各节点的所述初始特征。
- 根据权利要求1所述的方法,还包括:获取至少一个第二组学数据,所述第一组学数据和所述至少一个第二组学数据中的不同组学数据均属于不同组学、且所述至少一个第二组学数据和所述第一组学数据属于同一目标对象;提取各所述第二组学数据所对应的数据特征;所述基于各所述节点的节点特征,得到医学分析结果,包括:基于所述各节点的节点特征和各所述第二组学数据所对应的数据特征,确定所述目标对象的医学分析结果。
- 根据权利要求6所述的方法,其中,每个所述第二组学数据包括至少两种第二组学特征;对于任一所述第二组学数据,提取所述第二组学数据所对应的数据特征,包括:确定该第二组学数据的至少两种第二组学特征中不同第二组学特征之间的第二相关性;基于所述至少两种第二组学特征和各所述第二相关性,构建该第二组学数据对应的第二图结构;基于所述第二图结构,通过与该第二组学数据所属的组学对应的第二图神经网络,得到该第二组学数据所对应的各节点的节点特征,以得到所述第二组学数据所对应的所述数据特征,所述数据特征包括该第二组学数据所对应的各节点的节点特征。
- 根据权利要求1所述的方法,其中,所述基于所述第一图结构,通过第一图神经网络,得到所述第一图结构中的各节点的节点特征,以及所述基于各所述节点的节点特征得到医学分析结果,是通过分析结果预测模型得到的,其中,所述分析结果预测模型是基于各样本组学数据对初始神经网络模型进行训练得到的。
- 根据权利要求8所述的方法,所述方法还包括:确定各所述第一组学特征对应的重要性参数值;将所述医学分析结果和各所述第一组学特征对应的重要性参数值提供给用户;其中,各所述第一组学特征的重要性参数值是通过以下方式确定的:对于每一所述样本组学数据,基于该样本组学数据的医学分析结果,确定该样本组学数据对应的图结构中各组学特征所对应的节点的重要性;对于任一节点,基于所有样本组学数据所对应的该节点的重要性,得到该节点的重要性参数值,将该节点的重要性参数值作为该节点对应的组学特征的重要性参数值。
- 根据权利要求8所述的方法,其中,所述分析结果预测模型是通过下列方式得到的:获取训练数据集和初始神经网络模型,所述训练数据集包括各样本组学数据、每个样本组学数据对应的标注标签,所述标注标签表征了真实医学分析结果;将所述训练数据集划分为不同的子数据集;基于不同的子数据集对所述初始神经网络模型分别进行迭代训练,直至满足预设的训练结束条件;将每次训练结束时所对应的神经网络模型的模型参数进行融合,将融合后的模型参数作为所述分析结果预测模型的模型参数。
- 根据权利要求1所述的方法,其中,所述获取目标对象的第一组学数据,包括:获取所述目标对象的初始组学数据,所述初始组学数据包括至少两种初始组学特征;获取所述初始组学数据的关联组学特征,所述关联组学特征和所述初始组学数据属于同一目标对象,所述关联组学特征包括病例组学特征或影像组学特征中的至少一项;分别将每种所述初始组学特征和所述关联组学特征进行融合,得到每种初始组学特征对应的融合组学特征,并将其作为一种第一组学特征。
- 一种基于图神经网络的临床组学数据处理装置,包括:数据获取模块,用于获取目标对象的第一组学数据;从所述第一组学数据中提取至少两种第一组学特征;;相关性确定模块,用于确定所述至少两种第一组学特征中不同组学特征之间的第一相关性;图结构构建模块,用于基于所述至少两种第一组学特征和所述第一相关性,构建所述第一组学数据对应的第一图结构,其中,所述第一图结构中包含至少两个节点,且每个节点表征所述第一组学数据中的一种所述第一组学特征,所述第一图结构中至少包含一条连接所述至少两个节点的连边,所述连边表征所连接的两个节点对应的第一相关性;节点特征确定模块,用于基于所述第一图结构,通过第一图神经网络,得到所述第一图结构中的各节点的节点特征,所述节点特征具有至少一个维度;分析结果确定模块,用于基于各所述节点的节点特征对所述目标对象进行医学分析,得到所述至少一个维度中各个维度对应的医学分析结果;所述医学分析包括对所述目标对象进行疾病诊断、疾病分型和生存预测;所述医学分析结果包括各个维度对应的所述目标对象患疾病的概率、各个维度对应的所述目标对象的疾病是某种疾病类别的概率以及各个维度对应的所述目标对象的生存概率。
- 一种电子设备,包括处理器以及存储器:所述存储器被配置用于存储计算机程序,所述计算机程序在由所述处理器执行时,使得所述处理器执行权利要求1-11任一项所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,当所述计算机程序在计算机上运行时,使得计算机可以执行上述权利要求1-11中任一项所述的方法。
- 一种计算机程序产品,该计算机程序产品包括计算机指令,该计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行权利要求1-11任一项所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21896886.5A EP4198821A4 (en) | 2020-11-30 | 2021-11-19 | METHOD AND DEVICE, DEVICE AND MEDIUM FOR PROCESSING CLINICAL OMICS DATA ON A GRAPHNEURAL NETWORK BASE |
JP2023514943A JP7466058B2 (ja) | 2020-11-30 | 2021-11-19 | グラフニューラルネットワークに基づく臨床オミックスデータ処理方法、装置、電子機器、及びコンピュータプログラム |
US17/956,141 US20230028046A1 (en) | 2020-11-30 | 2022-09-29 | Clinical omics data processing method and apparatus based on graph neural network, device and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011379315.3 | 2020-11-30 | ||
CN202011379315.3A CN112364880B (zh) | 2020-11-30 | 2020-11-30 | 基于图神经网络的组学数据处理方法、装置、设备及介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/956,141 Continuation US20230028046A1 (en) | 2020-11-30 | 2022-09-29 | Clinical omics data processing method and apparatus based on graph neural network, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022111385A1 true WO2022111385A1 (zh) | 2022-06-02 |
Family
ID=74535703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/131652 WO2022111385A1 (zh) | 2020-11-30 | 2021-11-19 | 基于图神经网络的临床组学数据处理方法、装置、设备及介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230028046A1 (zh) |
EP (1) | EP4198821A4 (zh) |
JP (1) | JP7466058B2 (zh) |
CN (1) | CN112364880B (zh) |
WO (1) | WO2022111385A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116110509A (zh) * | 2022-11-15 | 2023-05-12 | 浙江大学 | 基于组学一致性预训练的药物敏感性预测方法和装置 |
CN116741397A (zh) * | 2023-08-15 | 2023-09-12 | 数据空间研究院 | 基于多组学数据融合的癌症分型方法、系统及存储介质 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364880B (zh) * | 2020-11-30 | 2022-06-14 | 腾讯科技(深圳)有限公司 | 基于图神经网络的组学数据处理方法、装置、设备及介质 |
US20220301716A1 (en) * | 2021-03-19 | 2022-09-22 | Canon Medical Systems Corporation | Medical information processing apparatus, medical information learning apparatus, medical information display apparatus, and medical information processing method |
CN113409306A (zh) * | 2021-07-15 | 2021-09-17 | 推想医疗科技股份有限公司 | 一种检测装置、训练方法、训练装置、设备和介质 |
CN113611366B (zh) * | 2021-07-26 | 2022-04-29 | 哈尔滨工业大学(深圳) | 基于图神经网络的基因模块挖掘方法、装置、计算机设备 |
CN113628726B (zh) * | 2021-08-10 | 2023-12-26 | 海南榕树家信息科技有限公司 | 基于图神经网络的中医辨治推荐系统、方法和电子设备 |
CN113889188A (zh) * | 2021-10-22 | 2022-01-04 | 赛业(广州)生物科技有限公司 | 一种疾病预测方法、系统、计算机设备及介质 |
CN114664382B (zh) * | 2022-04-28 | 2023-01-31 | 中国人民解放军总医院 | 多组学联合分析方法、装置及计算设备 |
CN115223657B (zh) * | 2022-09-20 | 2022-12-06 | 吉林农业大学 | 一种药用植物转录调控图谱预测方法 |
CN115952770B (zh) * | 2023-03-15 | 2023-07-25 | 广州汇通国信科技有限公司 | 一种数据标准化的处理方法、装置、电子设备及存储介质 |
CN116386850B (zh) * | 2023-03-28 | 2023-11-28 | 数坤科技股份有限公司 | 医学数据分析方法、装置、计算机设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019220128A1 (en) * | 2018-05-18 | 2019-11-21 | Benevolentai Technology Limited | Graph neutral networks with attention |
CN111028939A (zh) * | 2019-11-15 | 2020-04-17 | 华南理工大学 | 一种基于深度学习的多组学智能诊断系统 |
CN111681705A (zh) * | 2020-05-21 | 2020-09-18 | 中国科学院深圳先进技术研究院 | 一种miRNA-疾病关联预测方法、系统、终端以及存储介质 |
CN111931076A (zh) * | 2020-09-22 | 2020-11-13 | 平安国际智慧城市科技股份有限公司 | 基于有权有向图进行关系推荐的方法、装置和计算机设备 |
CN112364880A (zh) * | 2020-11-30 | 2021-02-12 | 腾讯科技(深圳)有限公司 | 基于图神经网络的组学数据处理方法、装置、设备及介质 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3550568B8 (en) | 2018-04-07 | 2024-08-14 | Tata Consultancy Services Limited | Graph convolution based gene prioritization on heterogeneous networks |
CN111276258B (zh) * | 2020-01-15 | 2022-10-14 | 大连理工大学 | 一种基于领域知识的药物致病关系抽取方法 |
CN111933212B (zh) * | 2020-08-26 | 2024-02-27 | 腾讯科技(深圳)有限公司 | 一种基于机器学习的临床组学数据处理方法及装置 |
-
2020
- 2020-11-30 CN CN202011379315.3A patent/CN112364880B/zh active Active
-
2021
- 2021-11-19 JP JP2023514943A patent/JP7466058B2/ja active Active
- 2021-11-19 WO PCT/CN2021/131652 patent/WO2022111385A1/zh unknown
- 2021-11-19 EP EP21896886.5A patent/EP4198821A4/en active Pending
-
2022
- 2022-09-29 US US17/956,141 patent/US20230028046A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019220128A1 (en) * | 2018-05-18 | 2019-11-21 | Benevolentai Technology Limited | Graph neutral networks with attention |
CN111028939A (zh) * | 2019-11-15 | 2020-04-17 | 华南理工大学 | 一种基于深度学习的多组学智能诊断系统 |
CN111681705A (zh) * | 2020-05-21 | 2020-09-18 | 中国科学院深圳先进技术研究院 | 一种miRNA-疾病关联预测方法、系统、终端以及存储介质 |
CN111931076A (zh) * | 2020-09-22 | 2020-11-13 | 平安国际智慧城市科技股份有限公司 | 基于有权有向图进行关系推荐的方法、装置和计算机设备 |
CN112364880A (zh) * | 2020-11-30 | 2021-02-12 | 腾讯科技(深圳)有限公司 | 基于图神经网络的组学数据处理方法、装置、设备及介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4198821A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116110509A (zh) * | 2022-11-15 | 2023-05-12 | 浙江大学 | 基于组学一致性预训练的药物敏感性预测方法和装置 |
CN116110509B (zh) * | 2022-11-15 | 2023-08-04 | 浙江大学 | 基于组学一致性预训练的药物敏感性预测方法和装置 |
CN116741397A (zh) * | 2023-08-15 | 2023-09-12 | 数据空间研究院 | 基于多组学数据融合的癌症分型方法、系统及存储介质 |
CN116741397B (zh) * | 2023-08-15 | 2023-11-03 | 数据空间研究院 | 基于多组学数据融合的癌症分型方法、系统及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112364880B (zh) | 2022-06-14 |
CN112364880A (zh) | 2021-02-12 |
US20230028046A1 (en) | 2023-01-26 |
JP2023542837A (ja) | 2023-10-12 |
EP4198821A1 (en) | 2023-06-21 |
JP7466058B2 (ja) | 2024-04-11 |
EP4198821A4 (en) | 2024-03-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022111385A1 (zh) | 基于图神经网络的临床组学数据处理方法、装置、设备及介质 | |
Jalal et al. | An overview of R in health decision sciences | |
Li et al. | Adaptive swarm balancing algorithms for rare-event prediction in imbalanced healthcare data | |
CZI Single-Cell Biology et al. | CZ CELLxGENE Discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data | |
Khan et al. | Chronic disease prediction using administrative data and graph theory: The case of type 2 diabetes | |
US7809660B2 (en) | System and method to optimize control cohorts using clustering algorithms | |
Stenwig et al. | Comparative analysis of explainable machine learning prediction models for hospital mortality | |
Awan et al. | Cricket match analytics using the big data approach | |
Xin et al. | GMM-Demux: sample demultiplexing, multiplet detection, experiment planning, and novel cell-type verification in single cell sequencing | |
CN113782089B (zh) | 基于多组学数据融合的药物敏感性预测方法和装置 | |
CN112863696A (zh) | 基于迁移学习和图神经网络的药物敏感性预测方法和装置 | |
Safaei et al. | E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database | |
CN113053535B (zh) | 一种医疗信息预测系统及医疗信息预测方法 | |
Rajabi et al. | Knowledge graphs and explainable ai in healthcare | |
Pradhan et al. | Supervised learning models for the preliminary detection of COVID-19 in patients using demographic and epidemiological parameters | |
Chen et al. | Trans-species learning of cellular signaling systems with bimodal deep belief networks | |
WO2021062198A1 (en) | Single cell rna-seq data processing | |
CN116434976A (zh) | 一种融合多源知识图谱的药物重定位方法和系统 | |
Martinez et al. | Human exposome assessment platform | |
Alfano et al. | Networks as biomarkers: Uses and purposes | |
de Kok et al. | Deep embedded clustering generalisability and adaptation for integrating mixed datatypes: two critical care cohorts | |
CN115114445A (zh) | 细胞知识图谱构建方法、装置、计算设备及存储介质 | |
CN114611879A (zh) | 一种基于多任务学习的临床风险预测系统 | |
US11257594B1 (en) | System and method for biomarker-outcome prediction and medical literature exploration | |
Ma et al. | Multi-objective optimization algorithm to discover condition-specific modules in multiple networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21896886 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023514943 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2021896886 Country of ref document: EP Effective date: 20230314 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |