WO2023134062A1 - 基于人工智能的药物靶点作用关系确定方法及装置 - Google Patents
基于人工智能的药物靶点作用关系确定方法及装置 Download PDFInfo
- Publication number
- WO2023134062A1 WO2023134062A1 PCT/CN2022/089690 CN2022089690W WO2023134062A1 WO 2023134062 A1 WO2023134062 A1 WO 2023134062A1 CN 2022089690 W CN2022089690 W CN 2022089690W WO 2023134062 A1 WO2023134062 A1 WO 2023134062A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- characterization information
- target
- drug
- protein
- molecular structure
- Prior art date
Links
- 239000003596 drug target Substances 0.000 title claims abstract description 116
- 230000003993 interaction Effects 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 32
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 222
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 222
- 238000012512 characterization method Methods 0.000 claims abstract description 200
- 239000003814 drug Substances 0.000 claims abstract description 128
- 229940079593 drug Drugs 0.000 claims abstract description 128
- 230000004927 fusion Effects 0.000 claims abstract description 104
- 238000012545 processing Methods 0.000 claims abstract description 44
- 238000012549 training Methods 0.000 claims description 48
- 230000009471 action Effects 0.000 claims description 44
- 150000001875 compounds Chemical class 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000003062 neural network model Methods 0.000 claims description 12
- 238000002598 diffusion tensor imaging Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000006916 protein interaction Effects 0.000 claims 3
- 235000018102 proteins Nutrition 0.000 description 137
- 230000000875 corresponding effect Effects 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 12
- 201000010099 disease Diseases 0.000 description 11
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 11
- 238000012790 confirmation Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 239000000126 substance Substances 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 235000001014 amino acid Nutrition 0.000 description 3
- 150000001413 amino acids Chemical class 0.000 description 3
- 230000008030 elimination Effects 0.000 description 3
- 238000003379 elimination reaction Methods 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
Definitions
- the present application relates to the technical field of intelligent medical treatment, in particular to an artificial intelligence-based method and device for determining drug-target action relationships.
- the drug targets confirmed based on chemical properties have a single role in the relationship between diseases and cannot be effectively applied
- the accuracy of confirmation of the relationship between drug targets is poor, resulting in low efficiency of drug use. Therefore, there is an urgent need for an artificial intelligence-based method for determining the relationship between drug targets. Solve the above problems.
- the present application provides an artificial intelligence-based method and device for determining the relationship between drug targets, the main purpose of which is to solve the existing problem that the relationship between drug targets cannot be accurately confirmed.
- an artificial intelligence-based method for determining the relationship between drug targets including:
- the molecular structure characterization information and the protein target characterization information after feature fusion are predicted and processed, and the obtained prediction results are used as drug target interaction relationships.
- an artificial intelligence-based device for determining the relationship between drug targets including:
- the obtaining module is used to obtain drug molecular image data and protein sequence data of the target drug
- An extraction module configured to extract molecular structure characterization information from the drug molecular image data, and extract protein target characterization information from the protein sequence data;
- a determination module configured to obtain a feature fusion coefficient matching the molecular structure characterization information and the protein target characterization information from the knowledge graph, and to perform an analysis of the molecular structure characterization information, the protein target characterization information, and the protein target based on the feature fusion coefficient.
- Point representation information for feature fusion
- the processing module is used to perform prediction processing on the molecular structure representation information and the protein target representation information after feature fusion based on the trained dual-task prediction model, and the obtained prediction results are used as drug target interaction relationships.
- a computer-readable storage medium on which computer-readable instructions are stored, wherein, when the computer-readable instructions are executed by a processor, an artificial intelligence-based drug-target interaction relationship is provided. Determine methods, including:
- prediction processing is performed on the molecular structure representation information and the protein target representation information after feature fusion, and the obtained prediction results are used as drug target interaction relationships.
- a computer device including a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor, wherein the computer-readable instructions are executed by the processor Realize the method of determining the relationship between drug targets based on artificial intelligence, including:
- prediction processing is performed on the molecular structure representation information and the protein target representation information after feature fusion, and the obtained prediction results are used as drug target interaction relationships.
- the technical solution provided by the embodiment of the present application has at least the following advantages:
- the present application provides an artificial intelligence-based method and device for determining the relationship between drug targets. Extract molecular structure characterization information from molecular image data, and extract protein target characterization information from the protein sequence data; obtain feature fusion coefficients that match the molecular structure characterization information and the protein target characterization information from the knowledge map , and perform feature fusion on the molecular structure characterization information and the protein target characterization information based on the feature fusion coefficient;
- the target characterization information is predicted and processed, and the obtained prediction results are used as the relationship between the drug target, which increases the diversity of the confirmation of the relationship between the drug target and the disease, and satisfies the elimination of multi-disease drug targets, thereby improving the role of the drug target
- the confirmation accuracy of the relationship greatly improves the efficiency of drug use.
- Figure 1 shows a flow chart of a method for determining the relationship between drug targets based on artificial intelligence provided by the embodiment of the present application
- Fig. 2 shows a schematic structural diagram of a dual-task prediction model provided by the embodiment of the present application
- Fig. 3 shows a flow chart of another artificial intelligence-based method for determining the relationship between drug targets provided by the embodiment of the present application
- Fig. 4 shows a flow chart of another artificial intelligence-based method for determining the relationship between drug targets provided by the embodiment of the present application
- Fig. 5 shows a schematic structural diagram of a protein sequence target prediction model provided by the embodiment of the present application
- Fig. 6 shows a composition block diagram of an artificial intelligence-based drug target action relationship determination device provided by an embodiment of the present application
- FIG. 7 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
- AI artificial intelligence
- the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
- artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
- Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
- Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
- a method for determining the relationship between drug targets based on artificial intelligence is provided, and the application of this method to computer equipment such as servers is used as an example for illustration, wherein the server can be An independent server can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), And cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, such as intelligent medical systems, digital medical platforms, etc.
- the above method comprises the following steps:
- the execution subject may be an intelligent management system with data processing functions, for example, an intelligent medical system, a data medical platform, and the like.
- the current execution subject is an intelligent medical system
- the target drug is a related drug that is suitable for matching the drug-target interaction relationship between the characteristics of the drug and the protein.
- the drug molecular structure image data of the target drug is expressed using a graph structure Molecules of the target drug.
- the image content in the drug molecular structure image data is the atom-chemical bond structure of the target drug molecule. From the image content, the molecular structure such as spatial features, atomic number, and charge number in the form of nodes-edges can be abstracted.
- the characteristic content of the protein sequence data is used to characterize the data characterizing the protein composed of 20 different letters (amino acids) permuted and combined, and the length of the protein sequence is generally hundreds or thousands, and the protein sequence data corresponds to all amino acids
- the sorting content of letters for example, letters such as glycine-g, alanine-b, valine-j, etc. indicate that the sorting content is b1-j2-g3-b4..., based on drug molecular image data and protein sequence data Perform feature extraction.
- the drug molecular structure image data in the embodiment of the present application is obtained by loading the drug molecular structure image data of the target drug generated by the intelligent medical system as the current execution subject based on the computer software for making molecular structure diagrams.
- the operator can obtain the drug molecular structure image data matching the target drug based on the drug database already stored in the current intelligent medical system, or make it through the molecular structure creation application program, and make it in the specified file format in the intelligent medical system acquisition, which is not specifically limited in the embodiment of this application.
- the protein sequence data can be pre-entered by the operator, or directly loaded based on the existing protein sequence data, which is not specifically limited in this embodiment of the present application.
- the features are extracted respectively for the drug molecular image data and the protein sequence data, that is, the molecular structure representation information is extracted from the drug molecular image data, and Extract protein target characterization information from protein sequences.
- the molecular structure characterization information is the content used to describe the molecular structure of the main or specific features in the drug molecular image data
- the protein target characterization information is the content used to describe the main or specific features of the protein target in the protein sequence data, specifically Yes
- the protein target is the site where the drug is expected to bind to the protein sequence, so as to determine whether it has an effect relationship with the target drug.
- the molecular structure representation information and the protein target after feature extraction are respectively determined based on the knowledge map.
- the feature fusion coefficient corresponding to the representation information is used for feature fusion.
- the knowledge map is constructed based on the magnetic resonance diffusion tensor imaging data set DTI.
- the knowledge map contains at least two nodes corresponding to different molecular structure representation information and different protein target representation information. Links indicate that there is an action relationship, so that the feature fusion coefficient is calculated based on the action relationship between the links, and the feature fusion is performed according to the feature fusion coefficient.
- the length of the link is used to represent the size of the action relationship, for example, the longer the link, the smaller the corresponding action relationship.
- the threshold for the length of the link, if the link length is a "several" multiple of the threshold, the feature fusion coefficient is "several" tenths, and thus the feature fusion coefficient is calculated.
- feature fusion is performed, that is, when the molecular structure representation information and protein target representation information are converted into feature vectors , according to the feature fusion coefficient, the feature vectors of the two are normalized to a value range, so as to obtain the input parameters that can be used as the dual-task forecasting model for forecasting processing.
- molecular structure characterization information and protein target characterization information are respectively subjected to eigenvector conversion, that is, text or image data is converted into a vector matrix, then during the conversion, the feature fusion coefficient is used as the conversion coefficient, and the vector matrices are multiplied to obtain The eigenvector matrix of molecular structure characterization information and protein target characterization information within a numerical range, so that the eigenvector matrix is used as the input parameter of the dual-task prediction model, which is not specifically limited in the embodiment of the present application.
- the dual-task prediction model is a hybrid neural network model with two input parameters and one output result, as shown in Figure 2, which includes molecular structure characterization information Drug1 of molecular structure, and protein target characterization including protein sequence
- the information Target1 is subjected to eigenvector conversion respectively, and is used as two input parameters for model input, and the prediction process is performed based on the trained dual-task prediction model, so as to obtain the prediction result as the drug target interaction relationship.
- the drug-target interaction relationship is expressed as the value of the interaction relationship between the drug molecular features and the protein target features.
- step 102 before extracting molecular structure characterization information from the drug molecular image data in step 102, it also includes:
- molecular structure characterization information is extracted from the drug molecular image data, specifically including:
- the graph isomorphism network model is trained on a large number of unlabeled drug molecule graph training data, so as to obtain a general graph isomorphism network model for migration, so as to support the training sample data of uncertain data
- An example is to build an unlabeled compound graph isomorphism network model, that is, to construct an unlabeled compound GIN model through the graph isomorphism network GIN model (Graph Isomorphism Network).
- the input parameter of the unlabeled compound GIN model is the structural content of the image data with graph node or edge attributes, that is, the adjacency matrix A of the image data and the corresponding attribute information X.
- GIN is based on the adjacency matrix of the molecular image data and the attribute information of each graph node (such as atoms), as well as the connection edges (such as chemical bonds) between them.
- Information in each iteration, each graph node updates its own information by aggregating the features of its neighbor nodes and its own features in the upper layer, and usually performs nonlinear transformation on the aggregated information. By stacking multi-layer networks, each graph node can obtain the information of neighbor nodes within the corresponding hop number. For drug molecular image data, the hidden vector of a single graph node cannot represent the chemical molecule well.
- the pooling method is finally used to obtain the entire image data.
- Information vector representation that is, a hidden vector rich in structural information is used to represent the overall information representation of the image data, so as to complete the model training of the isomorphic network model of the unlabeled compound graph to obtain the molecular feature prediction model.
- the molecular feature prediction model is obtained , based on the drug molecular image data that needs to be processed, the molecular structure characterization information for feature extraction is obtained.
- the data expression form of the molecular feature prediction model is: in, is the representation content of each graph node, h g is the representation content of the entire molecular graph, K is the number of model iteration layers, V is the graph node, G is the number of graph nodes or the number of connecting edges, and ⁇ is the graph coefficient.
- step 102 before extracting molecular structure characterization information from the drug molecular image data in step 102, it also includes:
- molecular structure characterization information is extracted from the drug molecular image data, specifically including:
- the language network model is trained on a large amount of unlabeled protein sequence training data, so as to obtain a general language network model for migration, so as to support the training sample data of uncertain data.
- the embodiment of the present application constructs an unlabeled protein Sequence language network model, that is, to construct an unlabeled protein sequence language network model through the language representation model BERT (Bidirectional Encoder Representation from Transformer).
- BERT Bidirectional Encoder Representation from Transformer
- the word embedding of each protein sequence data is used as the input parameters of the unlabeled protein sequence language network model Bert for training, as shown in Figure 5, the protein sequence target prediction model is finally obtained, and after obtaining the protein sequence data, based on The protein sequence target prediction model performs prediction processing on protein sequence data to obtain protein target characterization information.
- the method further includes: Constructing Knowledge Graphs from Magnetic Resonance Diffusion Tensor Imaging Datasets.
- the knowledge map is constructed in advance based on the magnetic resonance diffusion tensor imaging dataset.
- the magnetic resonance diffusion tensor imaging data set is a data set obtained by performing diffusion tensor imaging (DTI) on protein cells based on magnetic resonance imaging (MRI).
- DTI diffusion tensor imaging
- MRI magnetic resonance imaging
- the image content of the moving direction between atoms is used as a kind of knowledge information to link each node of different molecular structure characterization information and protein target characterization information.
- the knowledge map contains at least two nodes corresponding to different molecular structure characterization information and different protein target characterization information, wherein the two nodes are linked through the action relationship, so as to determine the action relationship and the corresponding Feature Fusion Coefficient.
- the acquisition of the feature fusion coefficient matching the molecular structure characterization information and the protein target characterization information from the knowledge graph includes: searching for the molecular structure characterization information, the protein target characterization information from the knowledge graph The action relationship corresponding to the target characterization information is converted into a feature fusion coefficient.
- the characterization information and protein target characterization information can be represented by H 1 and H 2 respectively, so as to find the matching action relationship.
- the action relationship is the link corresponding to each node. After the link is determined, it is converted into an action relationship weight based on the distance of the link.
- the feature fusion coefficient includes the interaction relationship coefficient and the affinity coefficient between the molecular structure and the protein target. For example, by pre-setting the length of the link Configure the interaction relationship threshold or affinity threshold.
- the interaction relationship coefficient or affinity coefficient in the feature fusion coefficient will be converted to "several" tenths, which will be used as the interaction relationship coefficient and affinity.
- the coefficient is not specifically limited in the embodiment of the present application.
- the step is to perform prediction processing on the molecular structure representation information and the protein target representation information after feature fusion based on the trained dual-task prediction model, and the obtained prediction result
- the method also includes: constructing a two-layer feed-forward neural network model, and training the two-layer feed-forward neural network model based on feature fusion training sample data, to obtain a dual-task prediction that completes the training Model.
- a two-layer feed-forward neural network model is constructed to integrate the features into the fused training sample data Molecular structure characterization information and protein target characterization information are used as input parameters of the dual-task prediction model for training to obtain a trained dual-task prediction model.
- the dual-task prediction model is used to perform dual-output processing including classification prediction tasks and regression prediction tasks, so as to obtain drug target action relationships including interaction relationship prediction results and affinity prediction results.
- the interaction relationship The prediction result is the interaction relationship coefficient between the molecular structure and the protein target
- the affinity prediction result is the relationship between the molecular structure and the protein target when there is no resistance, which is expressed as the molecular structure and protein target
- the affinity coefficient between where the resistance is that the molecular structure of the drug has a therapeutic or alleviating effect on the protein target, so that the drug can be used to treat or antagonize the protein target, and further, the affinity is expressed as the molecular structure of the drug
- the mutual promotion or beneficial effect between the structure and the protein target, the prediction results of the interaction relationship and the affinity prediction results in the examples of the present application are expressed numerically, which are not specifically limited in the examples of the present application.
- the feature fusion coefficients that match the molecular structure characterization information and protein target characterization information, including interaction coefficients and affinity coefficients, are used for feature fusion, and the predicted interaction relationship prediction results 1.
- the prediction result of affinity is processed based on artificial intelligence. Therefore, the feature fusion coefficient is different from the prediction result of interaction relationship and affinity prediction obtained through prediction, which are not specifically limited in this embodiment of the present application.
- step 104 performs prediction processing on the molecular structure representation information and the protein target representation information after feature fusion based on the trained dual-task prediction model, and the obtained prediction After the result is used as the drug-target action relationship
- the method further includes: calling a preset drug target-associated structure image database; searching for a drug-target-associated structure image database that matches the drug-target action relationship.
- the drug molecule correlates with structural image data and outputs it.
- the matching is performed based on the preset drug-target association structure image database in the intelligent medical system.
- the preset drug target correlation structure image database stores drug molecular correlation structure image data matched with different interaction coefficients and different affinity coefficients.
- the interaction relationship coefficient and the affinity coefficient are compared and matched with each interaction coefficient and affinity coefficient in the preset drug target association structure image database, so as to obtain the matched drug molecule association structure image data.
- the image data of the drug molecular correlation structure obtained by matching can be output as other drugs that have correlation effects with the target drug pushed to the operating user, so that the operator can perform other drug operations based on the drug molecular correlation structure image data. Examples are not specifically limited.
- it also includes: if no drug molecule-associated structure image data matching the drug-target action relationship is found in the preset drug target-associated structure image database, Then output the drug target interaction relationship including the interaction relationship prediction result and the affinity prediction result to indicate the matching of the artificial drug target interaction relationship.
- the embodiment of the present application provides an artificial intelligence-based method for determining the relationship between drug targets.
- the embodiment of the present application obtains the drug molecular image data and protein sequence data of the target drug; Extract molecular structure characterization information from molecular image data, and extract protein target characterization information from the protein sequence data; obtain feature fusion coefficients that match the molecular structure characterization information and the protein target characterization information from the knowledge map , and perform feature fusion on the molecular structure characterization information and the protein target characterization information based on the feature fusion coefficient;
- the target characterization information is predicted and processed, and the obtained prediction results are used as the relationship between the drug target, which increases the diversity of the confirmation of the relationship between the drug target and the disease, and satisfies the elimination of multi-disease drug targets, thereby improving the role of the drug target
- the confirmation accuracy of the relationship greatly improves the efficiency of drug use.
- the embodiment of the present application provides an artificial intelligence-based device for determining the relationship between drug targets, as shown in Figure 6, the device includes:
- An acquisition module 41 configured to acquire drug molecule image data and protein sequence data of the target drug
- An extraction module 42 configured to extract molecular structure characterization information from the drug molecular image data, and extract protein target characterization information from the protein sequence data;
- the determining module 43 is used to obtain a feature fusion coefficient that matches the molecular structure characterization information and the protein target characterization information from the knowledge map, and based on the feature fusion coefficient, analyze the molecular structure characterization information, the protein target Target characterization information for feature fusion;
- the processing module 44 is configured to perform prediction processing on the molecular structure characterization information and the protein target characterization information after feature fusion based on the trained dual-task prediction model, and the obtained prediction results are used as the drug-target interaction relationship.
- the device also includes: a first building module, a first training module,
- the first building block is used to build an unlabeled compound graph isomorphic network model
- the first training module is used to use the adjacency matrix and attribute information in the drug molecule graph training data, as well as the connection edges as the input parameters of the isomorphic network model of the unlabeled compound graph to perform model training, and obtain the completed training Molecular feature prediction model;
- the processing unit is configured to perform prediction processing on the drug molecule image data based on the trained molecular feature prediction model to obtain molecular structure characterization information.
- the device also includes: a second building module, a second training module,
- the second building block is used to build an unlabeled protein sequence language network model
- the second training module is used to perform word embedding using protein sequence training data as an input parameter of the unlabeled protein sequence language network model to perform model training to obtain a protein sequence target prediction model that has completed training;
- the processing unit is configured to perform prediction processing on the protein sequence data based on the trained protein sequence target prediction model to obtain protein target characterization information.
- the device also includes:
- the third building block is used to construct a knowledge map based on the magnetic resonance diffusion tensor imaging data set, and the knowledge map contains at least two nodes corresponding to different molecular structure characterization information and different protein target characterization information, wherein the two The nodes are linked through the action relationship;
- the determination module is specifically used to search the knowledge map for the action relationship corresponding to the molecular structure characterization information and the protein target characterization information, and convert the action relationship into a feature fusion coefficient, the
- the characteristic fusion coefficient includes the interaction relationship coefficient and the affinity coefficient between the molecular structure and the protein target.
- the device also includes:
- the third training module is used to construct a two-layer feed-forward neural network model, and train the two-layer feed-forward neural network model based on feature fusion training sample data to obtain a trained dual-task prediction model, wherein the dual-task
- the input parameters of the task prediction model are fused molecular structure characterization information and protein target characterization information.
- the dual-task prediction model is used to perform dual-output processing including classification prediction tasks and regression prediction tasks to obtain information that includes interaction Relationship prediction results and drug target action relationship of affinity prediction results.
- the device also includes:
- the calling module is used to call a preset drug target-related structure image database, and the preset drug-target related structure image database stores drug molecule related structure image data matched with different interaction coefficients and different affinity coefficients;
- An output module configured to search for drug molecule-associated structure image data that matches the action relationship of the drug-target from the preset drug-target-associated structure image database, and output the data.
- the output module is also used to output the image data containing the drug molecule association structure image data that matches the drug target action relationship if no drug molecule association structure image data is found in the preset drug target association structure image database.
- the interaction relationship prediction result and the drug target action relationship of the affinity prediction result are used to indicate the matching of the artificial drug target action relationship.
- the embodiment of the present application provides an artificial intelligence-based device for determining the relationship between drug targets.
- the embodiment of the present application obtains the drug molecular image data and protein sequence data of the target drug; Extract molecular structure characterization information from molecular image data, and extract protein target characterization information from the protein sequence data; obtain feature fusion coefficients that match the molecular structure characterization information and the protein target characterization information from the knowledge map , and perform feature fusion on the molecular structure characterization information and the protein target characterization information based on the feature fusion coefficient;
- the target characterization information is predicted and processed, and the obtained prediction results are used as the relationship between the drug target, which increases the diversity of the confirmation of the relationship between the drug target and the disease, and satisfies the elimination of multi-disease drug targets, thereby improving the role of the drug target
- the confirmation accuracy of the relationship greatly improves the efficiency of drug use.
- a computer-readable storage medium stores at least one executable instruction, and the computer-executable instruction can perform the artificial intelligence-based drug target action in any of the above method embodiments Relationship determination method.
- the computer-readable storage medium may be non-volatile or volatile.
- FIG. 7 shows a schematic structural diagram of a computer device provided according to an embodiment of the present application.
- the specific embodiment of the present application does not limit the specific implementation of the computer device.
- the computer device may include: a processor (processor) 502, a communication interface (Communications Interface) 504, a memory (memory) 506, and a communication bus 508.
- processor processor
- Communication interface Communication Interface
- memory memory
- the processor 502 , the communication interface 504 , and the memory 506 communicate with each other through the communication bus 508 .
- the communication interface 504 is configured to communicate with network elements of other devices such as clients or other servers.
- the processor 502 is configured to execute the program 510, specifically, it can execute the relevant steps in the above-mentioned embodiment of the method for determining the drug-target action relationship based on artificial intelligence.
- the program 510 may include program codes including computer operation instructions.
- the processor 502 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
- the one or more processors included in the computer device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
- the memory 506 is used for storing the program 510 .
- the memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
- the program 510 can specifically be used to make the processor 502 perform the following operations:
- prediction processing is performed on the molecular structure representation information and the protein target representation information after feature fusion, and the obtained prediction results are used as drug target interaction relationships.
- each module or each step of the above-mentioned application can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here
- the steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation.
- the present application is not limited to any specific combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Crystallography & Structural Chemistry (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
本申请公开了一种基于人工智能的药物靶点作用关系确定方法及装置,涉及智能医疗处理技术领域,主要目的在于解决现有无法准确对药物靶点的作用关系确认的问题。包括:获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
Description
本申请要求与2022年01月11日提交中国专利局、申请号为202210028223.3申请名称为“基于人工智能的药物靶点作用关系确定方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
本申请涉及一种智能医疗处理技术领域,特别是涉及一种基于人工智能的药物靶点作用关系确定方法及装置。
近年来,智能医疗技术的应用领域已经从临床治疗逐步向药物研发方向发展,越来越多的人工智能技术涉足于药物对不同病症的适用情况的分析,从而准确找到药物靶点。尤其是针对药物的分子结构进行研究,从而基于药物特征来确定适合药物靶点与不同病症的作用关系,依次来作为治疗依据。
发明人意识到目前药物靶点针对不同病症的作用关系的确定通常直接基于药物分子的化学性质进行确定,然而,基于化学性质确认的药物靶点对于病症的作用关系较为单一,且无法有效的适用于多病症的药物靶点确认中,从而使得药物靶点的作用关系的确认准确性较差,造成药物使用效率较低,因此,亟需一种基于人工智能的药物靶点作用关系确定方法来解决上述问题。
发明内容
有鉴于此,本申请提供一种基于人工智能的药物靶点作用关系确定方法及装置,主要目的在于解决现有无法准确对药物靶点的作用关系确认的问题。
依据本申请一个方面,提供了一种基于人工智能的药物靶点作用关系确定方法,包括:
获取目标药物的药物分子图像数据、以及蛋白质序列数据;
从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;
从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;
基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质 靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
依据本申请另一个方面,提供了一种基于人工智能的药物靶点作用关系确定装置,包括:
获取模块,用于获取目标药物的药物分子图像数据、以及蛋白质序列数据;
提取模块,用于从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;
确定模块,用于从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;
处理模块,用于基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
根据本申请的又一方面,提供了一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于人工智能的药物靶点作用关系确定方法,包括:
获取目标药物的药物分子图像数据、以及蛋白质序列数据;
从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;
从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;
基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
根据本申请的再一方面,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于人工智能的药物靶点作用关系确定方法,包括:
获取目标药物的药物分子图像数据、以及蛋白质序列数据;
从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;
从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;
基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请提供了一种基于人工智能的药物靶点作用关系确定方法及装置,与现有技术 相比,本申请实施例通过获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系,增加了对药物靶点对于病症作用关系确认的多样性,满足多病症药物靶点的去人,从而提高药物靶点作用关系的确认准确性,大大提高了药物使用效率。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种基于人工智能的药物靶点作用关系确定方法流程图;
图2示出了本申请实施例提供的一种双任务预测模型结构示意图;
图3示出了本申请实施例提供的另一种基于人工智能的药物靶点作用关系确定方法流程图;
图4示出了本申请实施例提供的又一种基于人工智能的药物靶点作用关系确定方法流程图;
图5示出了本申请实施例提供的一种蛋白质序列靶点预测模型结构示意图;
图6示出了本申请实施例提供的一种基于人工智能的药物靶点作用关系确定装置组成框图;
图7示出了本申请实施例提供的一种计算机设备的结构示意图。
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智 能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
基于此,在一个实施例中,如图1所示,提供了一种基于人工智能的药物靶点作用关系确定方法,以该方法应用于服务器等计算机设备为例进行说明,其中,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器,如智能医疗系统、数字医疗平台等。上述方法包括以下步骤:
101、获取目标药物的药物分子图像数据、以及蛋白质序列数据。
本申请实施例中,执行主体可以是带有数据处理功能的智能管理系统,例如,智能医疗系统、数据医疗平台等。示例性的,当前执行主体为智能医疗系统,目标药物为适用于待进行药物特征与蛋白质进行药物靶点作用关系匹配的相关药物,对应的,目标药物的药物分子结构图像数据为使用图结构表示目标药物的分子,其中,药物分子结构图像数据中的图像内容为目标药物分子的原子-化学键结构,从图像内容中可以抽象得到以节点-边形式的空间特征、原子序数、电荷数等分子结构的特征内容,蛋白质序列数据用于表征由20个不同的字母(氨基酸)排列组合而成的表征蛋白质的数据,且蛋白质序列的长度一般是上百或者上千,蛋白质序列数据为所有氨基酸所对应字母的排序内容,例如,甘氨酸-g、丙氨酸-b、缬氨酸-j等字母表示排序内容为b1-j2-g3-b4....,以基于药物分子图像数据以及蛋白质序列数据进行特征提取。
需要说明的是,本申请实施例中的药物分子结构图像数据为作为当前执行主体的智能医疗系统基于制作分子结构图的计算机软件生成目标药物的药物分子结构图像数据后进行加载得到的,此时,操作人员可以基于已经存储于当前智能医疗系统中的药物数据库获取与目标药物匹配的药物分子结构图像数据,也可以通过分子结构制作应用程序进行制作,并以智能医疗系统中的指定文件格式进行获取,本申请实施例不做具体限定。同时蛋白质序列数据可以为操作人员预先录入的,或者基于现有蛋白质序列数据直接进行加载得到,本申请实施例不做具体限定。
102、从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息。
为了提高基于药物分子图像数据与蛋白质序列数据之间的特征融合系数的匹配准确性,针对药物分子图像数据以及蛋白质序列数据分别进行提取特征,即从药物分子图 像数据中提取分子结构表征信息,以及从蛋白质序列中提取蛋白质靶点表征信息。其中,分子结构表征信息为用于描述药物分子图像数据中主要或特定特征的分子结构的内容,蛋白质靶点表征信息为用于描述蛋白质序列数据中主要或特定特征的蛋白质靶点的内容,具体的,蛋白质靶点为药物预期与蛋白质序列的结合部位,以便与目标药物进行判断是否具有作用关系。
103、从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合。
本申请实施例中,为了提高药物与蛋白质靶点之间的作用关系确定准确性,在基于双任务预测模型进行处理之前,基于知识图谱分别确定进行特征提取后的分子结构表征信息、蛋白质靶点表征信息所对应的特征融合系数,以进行特征融合。具体的,知识图谱为基于磁共振弥散张量成像数据集DTI构建的,知识图谱中包含有不同分子结构表征信息、不同蛋白质靶点表征信息分别对应的至少两个节点,每两个节点之间通过链接表示存在作用关系,从而基于链接之间的作用关系计算得到特征融合系数,以根据此特征融合系数进行特征融合。此时,利用链接的长度表征作用关系的大小,如链接越长,对应的作用关系越小。同时,通过对链接的长度进行预先配置阈值,若链接长度为阈值的“几”倍数,则特征融合系数为十分之“几”,从而计算得到特征融合系数。当确定特征融合系数后,为了使分子结构表征信息、蛋白质靶点表征信息进行双任务预测模型的处理,则进行特征融合,即在将分子结构表征信息以及蛋白质靶点表征信息进行特征向量转换时,按照特征融合系数将两者的特征向量归一化值一个数值区间,从而得到可以用作双任务预测模型的输入参数进行预测处理。例如,分子结构表征信息与蛋白质靶点表征信息分别进行特征向量转换,即将文字或图像数据转换为向量矩阵,则在转换时,按照特征融合系数作为转换系数,对于各个向量矩阵进行相乘,得到处于一个数值范围内的分子结构表征信息以及蛋白质靶点表征信息的特征向量矩阵,从而以此特征向量矩阵作为双任务预测模型的输入参数,本申请实施例不做具体限定。
104、基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
本申请实施例中,双任务预测模型为两个输入参数、一个输出结果的混合神经网络模型,如图2所示,包含分子结构的分子结构表征信息Drug1,以及包含蛋白质序列的蛋白质靶点表征信息Target1分别进行特征向量转换后,作为2个输入参数进行模型输入,并基于已训练的双任务预测模型进行预测处理,从而得到预测结果作为药物靶点作用关系。此时,药物靶点作用关系表示为药物分子特征与蛋白质靶点特征之间的作用关系数值,例如,目标药物a的药物分子特征包括原子s-化学键2,与蛋白质靶点b4-j6(丙氨酸-b、缬氨酸-j,对应丙氨酸排序第四、缬氨酸排序第六)所具有的药物靶点作用关系为0.4,则说明此时药物靶点作用关系较差,或者可以通过预设药物靶点作用关系阈 值来确定是否具有较强的药物靶点作用关系,本申请实施例不做具体限定。
在一个本申请实施例中,为了进一步限定及说明,如图3所示,步骤102中所述从所述药物分子图像数据中提取分子结构表征信息之前,还包括:
201、构建无标注化合物图同构网络模型;
202、以药物分子图训练数据中的邻接矩阵与属性信息、以及连接边作为所述所述无标注化合物图同构网络模型的输入参数进行模型训练,得到完成训练的分子特征预测模型;
对应的,从所述药物分子图像数据中提取分子结构表征信息,具体包括:
203、基于已完成训练的所述分子特征预测模型对所述药物分子图像数据进行预测处理,得到分子结构表征信息。
本申请实施例中,通过在大量无标签的药物分子图训练数据对图同构网络模型进行训练,从而得到通用的图同构网络模型进行迁移,以便支持不定数据的训练样本数据,本申请实施例构建无标注化合物图同构网络模型,即通过图同构网络GIN模型(Graph Isomorphism Network)构建无标注化合物GIN模型。其中,无标注化合物GIN模型输入参数是一个带图节点或连边属性的图像数据的结构内容,即为图像数据的邻接矩阵A和对应的属性信息X。同时,以药物的化学分子图训练数据作为模型训练的样本数据,GIN根据分子图像数据的邻接矩阵和每个图节点(如原子)的属性信息,以及它们之间的连接边(如化学键)的信息,在每一次迭代中,每一个图节点通过聚合邻居节点的特征及自己在上一层的特征来更新自己的信息,通常也会对聚合后的信息进行非线性变换。通过堆叠多层网络,每个图节点可以获取到相应跳数内的邻居节点信息。对于药物分子图像数据而言,单独的图节点的隐向量并不能很好地表示化学分子,为了能够从图像数据的拓扑结构上去表示分子的整体信息,最后通过池化方式,获取整个图像数据的信息向量表示,即用一个富含结构信息的隐向量来表示图像数据的整体信息表示,从而完成对无标注化合物图同构网络模型的模型训练得到分子特征预测模型,当得到分子特征预测模型后,基于需要进行处理的药物分子图像数据进行处理,得到特征提取的分子结构表征信息。
在一个本申请实施例中,为了进一步限定及说明,如图4所示,步骤102中所述从所述药物分子图像数据中提取分子结构表征信息之前,还包括:
301、构建无标注蛋白质序列语言网络模型;
302、以蛋白质序列训练数据进行词嵌入作为所述无标注蛋白质序列语言网络模型的输入参数进行模型训练,得到完成训练的蛋白质序列靶点预测模型;
对应的,从所述药物分子图像数据中提取分子结构表征信息,具体包括:
303、基于已完成训练的所述蛋白质序列靶点预测模型对所述蛋白质序列数据进行预测处理,得到蛋白质靶点表征信息。
本申请实施例中,通过在大量无标签的蛋白质序列训练数据对语言网络模型进行训练,从而得到通用的语言网络模型进行迁移,以便支持不定数据的训练样本数据,本申请实施例构建无标注蛋白质序列语言网络模型,即通过语言表征模型BERT(Bidirectional Encoder Representation from Transformer)构建无标注蛋白质序列语言网络模型。其中,经大量无标注蛋白质序列训练数据对无标注蛋白质序列语言网络模型BERT,由于蛋白质是由20个不同的字母(如氨基酸)排列组合而成的,且蛋白质序列的长度是上百或者上千,对每个蛋白序列数据进行词嵌入后作为无标注蛋白质序列语言网络模型Bert的输入参数进行训练,如图5所示,最终得到蛋白质序列靶点预测模型,并在获取蛋白质序列数据后,基于蛋白质序列靶点预测模型对蛋白质序列数据进行预测处理,得到蛋白质靶点表征信息。
在一个本申请实施例中,为了进一步限定及说明,步骤103从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数之前,所述方法还包括:基于磁共振弥散张量成像数据集构建知识图谱。
本申请实施例中,为了实现基于知识图谱进行特征融合系数的确定,预先基于磁共振弥散张量成像数据集构建知识图谱。其中,磁共振弥散张量成像数据集为基于核磁共振成像(MRI)对蛋白质细胞进行弥散张量成像(DTI)而得到的数据集,通过对分子、原子等进行扫描成像,得到包含有分子、原子之间的移动方向的图像内容,从而作为一种知识性信息为不同分子结构表征信息、蛋白质靶点表征信息的各个节点链接。具体的,知识图谱中包含有不同分子结构表征信息、不同蛋白质靶点表征信息分别对应的至少两个节点,其中,两个节点之间通过作用关系进行链接,以便通过链接确定作用关系以及对应的特征融合系数。
对应的,所述从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数包括:从所述知识图谱中查找与所述分子结构表征信息、所述蛋白质靶点表征信息所对应的作用关系,并将所述作用关系转换为特征融合系数。
本申请实施例中,由于药物分子结构图像数据表示为G=(A,X),其中,A和X分别表示邻接矩阵和特征矩阵,图像数据的图节点数为n,图节点特征维度为d,若分子特征预测模型GIN为每个图节点学习一个f维输出为:H
1=Pool(GIN(A,X))∈R
f×1。一个蛋白质序列数据为S,若蛋白质序列靶点预测模型Bert为每个序列学习一个f维输出为:H
2=Bert(S)∈R
f×1,在此基础上,知识图谱中各个分子结构表征信息、蛋白质靶点表征信息分别可以用H
1与H
2表示,从而查找到匹配的作用关系,此时,作用关系即为各个节点所对应的链接。当确定链接后,基于此链接的距离来转换为作用关系权值,所述特征融合系数包括分子结构与蛋白质靶点之间的相互作用关系系数以及亲和力系数,例如,通过对链接的长度进行预先配置相互作用关系阈值或亲和力阈值,若链接长度为阈值的“几”倍数,则特征融合系数中的相互作用关系系数或亲和力系数则转换为 十分之“几”,从而作为相互作用关系系数以及亲和力系数,本申请实施例不做具体限定。
在一个本申请实施例中,为了进一步限定及说明,步骤基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之前,所述方法还包括:构建两层前馈神经网络模型,并基于特征融合训练样本数据对所述两层前馈神经网络模型进行训练,得到完成训练的双任务预测模型。
为了实现通过双输入任务的预测模型对双目标进行预测,从而提高双目标的相互作用关系,如图2所示,构建两层前馈神经网络模型,以将特征融合训练样本数据中融合后的分子结构表征信息、以蛋白质靶点表征信息作为双任务预测模型的输入参数进行训练得到完成训练的双任务预测模型。其中,所述双任务预测模型用于进行包含分类预测任务以及回归预测任务的双输出处理,以得到包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系,此时,相互作用关系预测结果则为分子结构与蛋白质靶点之间的相互作用关系系数,亲和力预测结果则为分子结构与蛋白质靶点之间的在不具有对抗性时的关系,即表示为分子结构与蛋白质靶点之间的亲和力系数,其中,对抗性为药物的分子结构对蛋白质靶点具有治疗或缓解作用,从而可以通过此药物来对蛋白质靶点进行治疗或者对抗,进而的,亲和力则表现为药物的分子结构与蛋白质靶点之间相互促进或有利作用,本申请实施例中相互作用关系预测结果以及亲和力预测结果均通过数值化进行表示,本申请实施例不做具体限定。此时,基于知识图谱确定与分子结构表征信息、蛋白质靶点表征信息匹配的包含有相互作用关系系数以及亲和力系数的特征融合系数的作用是用于特征融合,而预测得到的相互作用关系预测结果、亲和力预测结果是基于人工智能进行处理得到的,因此,特征融合系数与预测得到的相互作用关系预测结果、亲和力预测结果是不同的,本申请实施例不做具体限定。
在一个本申请实施例中,为了进一步限定及说明,步骤104基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之后,所述方法还包括:调取预设药物靶点关联结构图像数据库;从所述预设药物靶点关联结构图像数据库查找与所述药物靶点作用关系匹配的药物分子关联结构图像数据,并进行输出。
为了满足操作人员获取其他与目标药物具有相互作用关系或亲和力关联的其他药物的需求,在处理得到药物靶点作用关系后,基于智能医疗系统中的预设药物靶点关联结构图像数据库进行匹配。其中,所述预设药物靶点关联结构图像数据库存储有不同相互作用关系系数以及不同亲和力系数匹配的药物分子关联结构图像数据,此时,通过将作为预测结果的药物靶点作用关系中的相互作用关系系数与、亲和力系数与预设药物靶点关联结构图像数据库中的各个相互作用关系系数、亲和力系数进行对比匹配,从而得到匹配的药物分子关联结构图像数据。此时,匹配得到的药物分子关联结构图像数据可以作为向操作用户推送的与目标药物具有关联作用的其他药物进行输出,以便操作人员 基于此药物分子关联结构图像数据进行其他药物操作,本申请实施例不做具体限定。
在一个本申请实施例中,为了进一步限定及说明,还包括:若所述预设药物靶点关联结构图像数据库中未查找到与所述药物靶点作用关系匹配的药物分子关联结构图像数据,则输出所述包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系,以指示进行人工药物靶点作用关系的匹配。
为了提高对药物靶点作用的确定有效性,并灵活为药物分子关联结构图像数据确定药物靶点作用关系,当在智能医疗系统的预设药物靶点关联结构图像数据库中未查找到与药物靶点作用关系匹配的药物分子关联结构图像数据时,说明智能医疗系统中没有与药物靶点作用关系系相关联的其他药物分子进行推送,因此,直接输出包含相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系,以便指示操作人员进行人工药物靶点作用关系的匹配。
本申请实施例提供了一种基于人工智能的药物靶点作用关系确定方法,与现有技术相比,本申请实施例通过获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系,增加了对药物靶点对于病症作用关系确认的多样性,满足多病症药物靶点的去人,从而提高药物靶点作用关系的确认准确性,大大提高了药物使用效率。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种基于人工智能的药物靶点作用关系确定装置,如图6所示,该装置包括:
获取模块41,用于获取目标药物的药物分子图像数据、以及蛋白质序列数据;
提取模块42,用于从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;
确定模块43,用于从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;
处理模块44,用于基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
进一步地,装置还包括:第一构建模块,第一训练模块,
所述第一构建模块,用于构建无标注化合物图同构网络模型;
所述第一训练模块,用于以药物分子图训练数据中的邻接矩阵与属性信息、以及连接边作为所述所述无标注化合物图同构网络模型的输入参数进行模型训练,得到完成训练的分子特征预测模型;
所述处理单元,用于基于已完成训练的所述分子特征预测模型对所述药物分子图像数据进行预测处理,得到分子结构表征信息。
进一步地,所述装置还包括:第二构建模块,第二训练模块,
所述第二构建模块,用于构建无标注蛋白质序列语言网络模型;
所述第二训练模块,用于以蛋白质序列训练数据进行词嵌入作为所述无标注蛋白质序列语言网络模型的输入参数进行模型训练,得到完成训练的蛋白质序列靶点预测模型;
所述处理单元,用于基于已完成训练的所述蛋白质序列靶点预测模型对所述蛋白质序列数据进行预测处理,得到蛋白质靶点表征信息。
进一步地,所述装置还包括:
第三构建模块,用于基于磁共振弥散张量成像数据集构建知识图谱,所述知识图谱中包含有不同分子结构表征信息、不同蛋白质靶点表征信息分别对应的至少两个节点,其中,两个节点之间通过作用关系进行链接;
所述确定模块,具体用于从所述知识图谱中查找与所述分子结构表征信息、所述蛋白质靶点表征信息所对应的作用关系,并将所述作用关系转换为特征融合系数,所述特征融合系数包括分子结构与蛋白质靶点之间的相互作用关系系数以及亲和力系数。
进一步地,所述装置还包括:
第三训练模块,用于构建两层前馈神经网络模型,并基于特征融合训练样本数据对所述两层前馈神经网络模型进行训练,得到完成训练的双任务预测模型,其中,所述双任务预测模型的输入参数为融合后的分子结构表征信息、以蛋白质靶点表征信息,所述双任务预测模型用于进行包含分类预测任务以及回归预测任务的双输出处理,以得到包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系。
进一步地,所述装置还包括:
调取模块,用于调取预设药物靶点关联结构图像数据库,所述预设药物靶点关联结构图像数据库存储有不同相互作用关系系数以及不同亲和力系数匹配的药物分子关联结构图像数据;
输出模块,用于从所述预设药物靶点关联结构图像数据库查找与所述药物靶点作用关系匹配的药物分子关联结构图像数据,并进行输出。
进一步地,所述输出模块,还用于若所述预设药物靶点关联结构图像数据库中未查找到与所述药物靶点作用关系匹配的药物分子关联结构图像数据,则输出所述包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系,以指示进行人工药物靶点作用关系的匹配。
本申请实施例提供了一种基于人工智能的药物靶点作用关系确定装置,与现有技术相比,本申请实施例通过获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白 质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系,增加了对药物靶点对于病症作用关系确认的多样性,满足多病症药物靶点的去人,从而提高药物靶点作用关系的确认准确性,大大提高了药物使用效率。
根据本申请一个实施例提供了一种计算机可读存储介质,所述存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的基于人工智能的药物靶点作用关系确定方法。所述计算机可读存储介质可以是非易失性,也可以是易失性。
图7示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具体实施例并不对计算机设备的具体实现做限定。
如图7所示,该计算机设备可以包括:处理器(processor)502、通信接口(Communications Interface)504、存储器(memory)506、以及通信总线508。
其中:处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。
通信接口504,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器502,用于执行程序510,具体可以执行上述基于人工智能的药物靶点作用关系确定方法实施例中的相关步骤。
具体地,程序510可以包括程序代码,该程序代码包括计算机操作指令。
处理器502可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器506,用于存放程序510。存储器506可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
程序510具体可以用于使得处理器502执行以下操作:
获取目标药物的药物分子图像数据、以及蛋白质序列数据;
从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;
从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;
基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的 计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。
Claims (20)
- 一种基于人工智能的药物靶点作用关系确定方法,其中,包括:获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
- 根据权利要求1所述的方法,其中,所述从所述药物分子图像数据中提取分子结构表征信息之前,所述方法还包括:构建无标注化合物图同构网络模型;以药物分子图训练数据中的邻接矩阵与属性信息、以及连接边作为所述所述无标注化合物图同构网络模型的输入参数进行模型训练,得到完成训练的分子特征预测模型;所述从所述药物分子图像数据中提取分子结构表征信息包括:基于已完成训练的所述分子特征预测模型对所述药物分子图像数据进行预测处理,得到分子结构表征信息。
- 根据权利要求1所述的方法,其中,所述从所述蛋白质序列数据中提取蛋白质靶点表征信息之前,所述方法还包括:构建无标注蛋白质序列语言网络模型;以蛋白质序列训练数据进行词嵌入作为所述无标注蛋白质序列语言网络模型的输入参数进行模型训练,得到完成训练的蛋白质序列靶点预测模型;所述从所述蛋白质序列数据中提取蛋白质靶点表征信息包括:基于已完成训练的所述蛋白质序列靶点预测模型对所述蛋白质序列数据进行预测处理,得到蛋白质靶点表征信息。
- 根据权利要求2或3所述的方法,其中,所述从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数之前,所述方法还包括:基于磁共振弥散张量成像数据集构建知识图谱,所述知识图谱中包含有不同分子结构表征信息、不同蛋白质靶点表征信息分别对应的至少两个节点,其中,两个节点之间通过作用关系进行链接;所述从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数包括:从所述知识图谱中查找与所述分子结构表征信息、所述蛋白质靶点表征信息所对应 的作用关系,并将所述作用关系转换为特征融合系数,所述特征融合系数包括分子结构与蛋白质靶点之间的相互作用关系系数以及亲和力系数。
- 根据权利要求4所述的方法,其中,所述基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之前,所述方法还包括:构建两层前馈神经网络模型,并基于特征融合训练样本数据对所述两层前馈神经网络模型进行训练,得到完成训练的双任务预测模型,其中,所述双任务预测模型的输入参数为融合后的分子结构表征信息、以蛋白质靶点表征信息,所述双任务预测模型用于进行包含分类预测任务以及回归预测任务的双输出处理,以得到包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系。
- 根据权利要求5所述的方法,其中,所述基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之后,所述方法还包括:调取预设药物靶点关联结构图像数据库,所述预设药物靶点关联结构图像数据库存储有不同相互作用关系系数以及不同亲和力系数匹配的药物分子关联结构图像数据;从所述预设药物靶点关联结构图像数据库查找与所述药物靶点作用关系匹配的药物分子关联结构图像数据,并进行输出。
- 根据权利要求6所述的方法,其中,所述方法还包括:若所述预设药物靶点关联结构图像数据库中未查找到与所述药物靶点作用关系匹配的药物分子关联结构图像数据,则输出所述包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系,以指示进行人工药物靶点作用关系的匹配。
- 一种基于人工智能的药物靶点作用关系确定装置,其中,包括:获取模块,用于获取目标药物的药物分子图像数据、以及蛋白质序列数据;提取模块,用于从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;确定模块,用于从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;处理模块,用于基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
- 一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于人工智能的药物靶点作用关系确定方法,包括:获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
- 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现从所述药物分子图像数据中提取分子结构表征信息之前,所述方法还包括:构建无标注化合物图同构网络模型;以药物分子图训练数据中的邻接矩阵与属性信息、以及连接边作为所述所述无标注化合物图同构网络模型的输入参数进行模型训练,得到完成训练的分子特征预测模型;所述从所述药物分子图像数据中提取分子结构表征信息包括:基于已完成训练的所述分子特征预测模型对所述药物分子图像数据进行预测处理,得到分子结构表征信息。
- 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现从所述蛋白质序列数据中提取蛋白质靶点表征信息之前,所述方法还包括:构建无标注蛋白质序列语言网络模型;以蛋白质序列训练数据进行词嵌入作为所述无标注蛋白质序列语言网络模型的输入参数进行模型训练,得到完成训练的蛋白质序列靶点预测模型;所述从所述蛋白质序列数据中提取蛋白质靶点表征信息包括:基于已完成训练的所述蛋白质序列靶点预测模型对所述蛋白质序列数据进行预测处理,得到蛋白质靶点表征信息。
- 根据权利要求10或11所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数之前,所述方法还包括:基于磁共振弥散张量成像数据集构建知识图谱,所述知识图谱中包含有不同分子结构表征信息、不同蛋白质靶点表征信息分别对应的至少两个节点,其中,两个节点之间通过作用关系进行链接;所述从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数包括:从所述知识图谱中查找与所述分子结构表征信息、所述蛋白质靶点表征信息所对应的作用关系,并将所述作用关系转换为特征融合系数,所述特征融合系数包括分子结构与蛋白质靶点之间的相互作用关系系数以及亲和力系数。
- 根据权利要求12所述的计算机可读存储介质,其中,所述计算机可读指令被 处理器执行时实现基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之前,所述方法还包括:构建两层前馈神经网络模型,并基于特征融合训练样本数据对所述两层前馈神经网络模型进行训练,得到完成训练的双任务预测模型,其中,所述双任务预测模型的输入参数为融合后的分子结构表征信息、以蛋白质靶点表征信息,所述双任务预测模型用于进行包含分类预测任务以及回归预测任务的双输出处理,以得到包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系。
- 根据权利要求13所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之后,所述方法还包括:调取预设药物靶点关联结构图像数据库,所述预设药物靶点关联结构图像数据库存储有不同相互作用关系系数以及不同亲和力系数匹配的药物分子关联结构图像数据;从所述预设药物靶点关联结构图像数据库查找与所述药物靶点作用关系匹配的药物分子关联结构图像数据,并进行输出。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于人工智能的药物靶点作用关系确定方法,包括:获取目标药物的药物分子图像数据、以及蛋白质序列数据;从所述药物分子图像数据中提取分子结构表征信息,并从所述蛋白质序列数据中提取蛋白质靶点表征信息;从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数,并基于所述特征融合系数对所述分子结构表征信息、所述蛋白质靶点表征信息进行特征融合;基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系。
- 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现从所述药物分子图像数据中提取分子结构表征信息之前,所述方法还包括:构建无标注化合物图同构网络模型;以药物分子图训练数据中的邻接矩阵与属性信息、以及连接边作为所述所述无标注化合物图同构网络模型的输入参数进行模型训练,得到完成训练的分子特征预测模型;所述从所述药物分子图像数据中提取分子结构表征信息包括:基于已完成训练的所述分子特征预测模型对所述药物分子图像数据进行预测处理,得到分子结构表征信息。
- 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现从所述蛋白质序列数据中提取蛋白质靶点表征信息之前,所述方法还包括:构建无标注蛋白质序列语言网络模型;以蛋白质序列训练数据进行词嵌入作为所述无标注蛋白质序列语言网络模型的输入参数进行模型训练,得到完成训练的蛋白质序列靶点预测模型;所述从所述蛋白质序列数据中提取蛋白质靶点表征信息包括:基于已完成训练的所述蛋白质序列靶点预测模型对所述蛋白质序列数据进行预测处理,得到蛋白质靶点表征信息。
- 根据权利要求16或17所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数之前,所述方法还包括:基于磁共振弥散张量成像数据集构建知识图谱,所述知识图谱中包含有不同分子结构表征信息、不同蛋白质靶点表征信息分别对应的至少两个节点,其中,两个节点之间通过作用关系进行链接;所述从知识图谱中获取与所述分子结构表征信息和所述蛋白质靶点表征信息匹配的特征融合系数包括:从所述知识图谱中查找与所述分子结构表征信息、所述蛋白质靶点表征信息所对应的作用关系,并将所述作用关系转换为特征融合系数,所述特征融合系数包括分子结构与蛋白质靶点之间的相互作用关系系数以及亲和力系数。
- 根据权利要求18所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之前,所述方法还包括:构建两层前馈神经网络模型,并基于特征融合训练样本数据对所述两层前馈神经网络模型进行训练,得到完成训练的双任务预测模型,其中,所述双任务预测模型的输入参数为融合后的分子结构表征信息、以蛋白质靶点表征信息,所述双任务预测模型用于进行包含分类预测任务以及回归预测任务的双输出处理,以得到包含有相互作用关系预测结果以及亲和力预测结果的药物靶点作用关系。
- 根据权利要求19所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于已训练的双任务预测模型对特征融合后的所述分子结构表征信息、所述蛋白质靶点表征信息进行预测处理,得到的预测结果作为药物靶点作用关系之后,所述方法还包括:调取预设药物靶点关联结构图像数据库,所述预设药物靶点关联结构图像数据库存储有不同相互作用关系系数以及不同亲和力系数匹配的药物分子关联结构图像数据;从所述预设药物靶点关联结构图像数据库查找与所述药物靶点作用关系匹配的药 物分子关联结构图像数据,并进行输出。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210028223.3 | 2022-01-11 | ||
CN202210028223.3A CN114360639A (zh) | 2022-01-11 | 2022-01-11 | 基于人工智能的药物靶点作用关系确定方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023134062A1 true WO2023134062A1 (zh) | 2023-07-20 |
Family
ID=81109656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/089690 WO2023134062A1 (zh) | 2022-01-11 | 2022-04-27 | 基于人工智能的药物靶点作用关系确定方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114360639A (zh) |
WO (1) | WO2023134062A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094987A (zh) * | 2023-10-13 | 2023-11-21 | 四川大学 | 一种优化神经调控物理场方向的方法 |
CN117831640A (zh) * | 2024-03-05 | 2024-04-05 | 青岛国实科技集团有限公司 | 基于超算的医药产业数字孪生平台 |
CN117877580A (zh) * | 2023-12-29 | 2024-04-12 | 深药科技(苏州)有限公司 | 基于深度语言模型的多肽关键位点预测方法、设备和介质 |
CN118197402A (zh) * | 2024-04-02 | 2024-06-14 | 宁夏大学 | 一种药物靶点关系的预测方法、装置和设备 |
CN118506856A (zh) * | 2024-07-18 | 2024-08-16 | 中国石油大学(华东) | 一种基于人工智能的药物靶点相互作用预测方法及系统 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114360639A (zh) * | 2022-01-11 | 2022-04-15 | 平安科技(深圳)有限公司 | 基于人工智能的药物靶点作用关系确定方法及装置 |
CN114925270B (zh) * | 2022-05-09 | 2024-07-19 | 华南师范大学 | 一种会话推荐方法和模型 |
CN115440314B (zh) * | 2022-09-06 | 2023-08-15 | 湖南艾科瑞生物工程有限公司 | 琼脂糖凝胶的电泳性能检测方法及相关设备 |
CN116246697B (zh) * | 2023-05-11 | 2023-08-01 | 上海微观纪元数字科技有限公司 | 用于药物的靶点蛋白质预测方法及装置、设备、存储介质 |
CN116451176B (zh) * | 2023-06-15 | 2024-01-12 | 武汉大学人民医院(湖北省人民医院) | 一种基于深度学习的药物光谱数据分析方法及装置 |
CN117524298B (zh) * | 2023-11-03 | 2024-07-19 | 和合数据科技(深圳)有限公司 | 主动寻找、分析、比对、预警药物作用靶点的方法和装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112331273A (zh) * | 2020-10-28 | 2021-02-05 | 星药科技(北京)有限公司 | 一种基于多维度信息的药物小分子-蛋白靶点反应预测方法 |
CN113160894A (zh) * | 2021-04-23 | 2021-07-23 | 平安科技(深圳)有限公司 | 药物与靶标的相互作用预测方法、装置、设备及存储介质 |
CN113409897A (zh) * | 2021-05-25 | 2021-09-17 | 电子科技大学长三角研究院(衢州) | 药物-靶标相互作用的预测方法、装置、设备和存储介质 |
CN113470741A (zh) * | 2021-07-28 | 2021-10-01 | 腾讯科技(深圳)有限公司 | 药物靶标关系预测方法、装置、计算机设备及存储介质 |
CN114360639A (zh) * | 2022-01-11 | 2022-04-15 | 平安科技(深圳)有限公司 | 基于人工智能的药物靶点作用关系确定方法及装置 |
-
2022
- 2022-01-11 CN CN202210028223.3A patent/CN114360639A/zh active Pending
- 2022-04-27 WO PCT/CN2022/089690 patent/WO2023134062A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112331273A (zh) * | 2020-10-28 | 2021-02-05 | 星药科技(北京)有限公司 | 一种基于多维度信息的药物小分子-蛋白靶点反应预测方法 |
CN113160894A (zh) * | 2021-04-23 | 2021-07-23 | 平安科技(深圳)有限公司 | 药物与靶标的相互作用预测方法、装置、设备及存储介质 |
CN113409897A (zh) * | 2021-05-25 | 2021-09-17 | 电子科技大学长三角研究院(衢州) | 药物-靶标相互作用的预测方法、装置、设备和存储介质 |
CN113470741A (zh) * | 2021-07-28 | 2021-10-01 | 腾讯科技(深圳)有限公司 | 药物靶标关系预测方法、装置、计算机设备及存储介质 |
CN114360639A (zh) * | 2022-01-11 | 2022-04-15 | 平安科技(深圳)有限公司 | 基于人工智能的药物靶点作用关系确定方法及装置 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094987A (zh) * | 2023-10-13 | 2023-11-21 | 四川大学 | 一种优化神经调控物理场方向的方法 |
CN117094987B (zh) * | 2023-10-13 | 2023-12-22 | 四川大学 | 一种优化神经调控物理场方向的方法 |
CN117877580A (zh) * | 2023-12-29 | 2024-04-12 | 深药科技(苏州)有限公司 | 基于深度语言模型的多肽关键位点预测方法、设备和介质 |
CN117831640A (zh) * | 2024-03-05 | 2024-04-05 | 青岛国实科技集团有限公司 | 基于超算的医药产业数字孪生平台 |
CN117831640B (zh) * | 2024-03-05 | 2024-05-14 | 青岛国实科技集团有限公司 | 基于超算的医药产业数字孪生平台 |
CN118197402A (zh) * | 2024-04-02 | 2024-06-14 | 宁夏大学 | 一种药物靶点关系的预测方法、装置和设备 |
CN118506856A (zh) * | 2024-07-18 | 2024-08-16 | 中国石油大学(华东) | 一种基于人工智能的药物靶点相互作用预测方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
CN114360639A (zh) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023134062A1 (zh) | 基于人工智能的药物靶点作用关系确定方法及装置 | |
WO2022068196A1 (zh) | 跨模态的数据处理方法、装置、存储介质以及电子装置 | |
WO2020088439A1 (zh) | 实现异构图、分子空间结构性质识别的方法、装置和计算机设备 | |
WO2023000574A1 (zh) | 一种模型训练方法、装置、设备及可读存储介质 | |
CN110347932B (zh) | 一种基于深度学习的跨网络用户对齐方法 | |
WO2018099084A1 (zh) | 一种神经网络模型训练方法、装置、芯片和系统 | |
WO2023029352A1 (zh) | 基于图神经网络的药物小分子性质预测方法、装置及设备 | |
BR112020022270A2 (pt) | sistemas e métodos para unificar modelos estatísticos para diferentes modalidades de dados | |
CN101587478B (zh) | 图像训练、自动标注、检索方法及装置 | |
CN111325326A (zh) | 一种基于异质网络表示学习的链路预测方法 | |
WO2023134061A1 (zh) | 基于人工智能的药物特征信息确定方法及装置 | |
CN111737535B (zh) | 一种基于元结构和图神经网络的网络表征学习方法 | |
EP3210133A1 (en) | Tagging personal photos with deep networks | |
CN110910218A (zh) | 一种基于深度学习的多行为迁移推荐方法 | |
CN113254630B (zh) | 一种面向全球综合观测成果的领域知识图谱推荐方法 | |
CN114898811A (zh) | 蛋白质训练模型的训练方法及装置、电子设备和存储介质 | |
CN115391553B (zh) | 一种自动搜索时序知识图谱补全模型的方法 | |
CN113111657B (zh) | 一种跨语言知识图谱对齐与融合方法、装置及存储介质 | |
Sarkar et al. | An algorithm for DNA read alignment on quantum accelerators | |
CN114358202B (zh) | 基于药物分子图像分类的信息推送方法及装置 | |
CN116978449A (zh) | 相互作用界面的预测方法、装置、电子设备及存储介质 | |
CN117891960B (zh) | 基于自适应梯度调制的多模态哈希检索方法和系统 | |
US20240111807A1 (en) | Embedding and Analyzing Multivariate Information in Graph Structures | |
CN111506832B (zh) | 一种基于块矩阵补全的异构对象补全方法 | |
Peng et al. | Pocket-specific 3d molecule generation by fragment-based autoregressive diffusion models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22919703 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |