CN114446413B - Molecular property prediction method and device and electronic equipment - Google Patents

Molecular property prediction method and device and electronic equipment Download PDF

Info

Publication number
CN114446413B
CN114446413B CN202210165349.5A CN202210165349A CN114446413B CN 114446413 B CN114446413 B CN 114446413B CN 202210165349 A CN202210165349 A CN 202210165349A CN 114446413 B CN114446413 B CN 114446413B
Authority
CN
China
Prior art keywords
dimensional
edge
diagram
molecular
atom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210165349.5A
Other languages
Chinese (zh)
Other versions
CN114446413A (en
Inventor
李双利
周景博
徐童
窦德景
熊辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210165349.5A priority Critical patent/CN114446413B/en
Publication of CN114446413A publication Critical patent/CN114446413A/en
Application granted granted Critical
Publication of CN114446413B publication Critical patent/CN114446413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Physics & Mathematics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medicinal Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a molecular property prediction method, a molecular property prediction device and electronic equipment, relates to the technical field of computers, and particularly relates to the technical field of deep learning. The specific implementation scheme is as follows: generating a two-dimensional molecular diagram and a three-dimensional molecular diagram of the target molecule according to the molecular formula of the target molecule; for each edge in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating the atomic characteristics of two atoms connected with the edge and the edge characteristics of the edge to obtain new edge characteristics of the edge; for each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating edge features connected to each edge of the atom to obtain new atomic features of the atom; according to the atomic characteristics of each atom, determining the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram, and according to the mapping relation between the preset characteristics and the chemical properties, determining the chemical properties corresponding to the two-dimensional characteristics and the three-dimensional characteristics as the chemical properties of the target molecules. The molecular properties of the target molecule can be predicted more accurately.

Description

Molecular property prediction method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of deep learning technology.
Background
The molecular chemical property is determined by predicting the molecular chemical property through the property prediction model obtained through deep learning training, which has great significance in scenes such as drug research and development, and the molecular is often expressed by the property of the molecular formula, and the property prediction model cannot identify the molecular formula but can only identify the feature vector (hereinafter referred to as the feature), so that the feature which can be used for characterizing the molecular needs to be extracted based on the molecular formula.
Disclosure of Invention
The disclosure provides a molecular property prediction method, a molecular property prediction device and electronic equipment.
According to a first aspect of the present disclosure, there is provided a molecular property prediction method comprising:
Generating a two-dimensional molecular diagram and a three-dimensional molecular diagram of the target molecule according to the molecular formula of the target molecule, wherein the two-dimensional molecular diagram and the three-dimensional molecular diagram comprise atoms and edges, the atoms in the two-dimensional molecular diagram and the three-dimensional molecular diagram are used for representing the atoms in the target molecule, the edges in the two-dimensional molecular diagram are used for representing chemical bonds among the atoms in the target molecule, and the edges in the three-dimensional molecular diagram are used for representing the position relationship among the atoms in the target molecule;
For each edge in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating the atomic characteristics of two atoms connected with the edge and the edge characteristics of the edge to obtain new edge characteristics of the edge;
For each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating edge features connected to each edge of the atom to obtain new atomic features of the atom;
determining the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram according to the atomic characteristics of each atom;
and determining the chemical properties corresponding to the two-dimensional features and the three-dimensional features according to the mapping relation between the preset features and the chemical properties, and taking the chemical properties as the chemical properties of the target molecules.
According to a second aspect of the present disclosure, there is provided a molecular property prediction apparatus comprising:
A molecule diagram generation module, configured to generate a two-dimensional molecule diagram and a three-dimensional molecule diagram of a target molecule according to a molecular formula of the target molecule, where the two-dimensional molecule diagram and the three-dimensional molecule diagram include atoms and edges, the atoms in the two-dimensional molecule diagram and the three-dimensional molecule diagram are used for representing atoms in the target molecule, the edges in the two-dimensional molecule diagram are used for representing chemical bonds between atoms in the target molecule, and the edges in the three-dimensional molecule diagram are used for representing a positional relationship between atoms in the target molecule;
An atomic propagation module, configured to aggregate, for each edge in the two-dimensional molecular graph and the three-dimensional molecular graph, an atomic feature of two atoms connected to the edge and an edge feature of the edge, to obtain a new edge feature of the edge;
The side-to-atom propagation module is used for converging the side characteristics of each side connected with the atoms aiming at each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram to obtain new atomic characteristics of the atoms;
the characteristic determining module is used for determining the two-dimensional characteristic of the two-dimensional molecular graph and the three-dimensional characteristic of the three-dimensional molecular graph according to the atomic characteristic of each atom to be used as the molecular characterization of the target molecule;
And the property prediction module is used for determining the chemical properties corresponding to the two-dimensional characteristics and the three-dimensional characteristics as the chemical properties of the target molecules according to the mapping relation between the preset characteristics and the chemical properties.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects above.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the first aspects above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow diagram of a molecular property prediction method according to the present disclosure;
FIG. 2 is a schematic diagram of the structure of a messaging network for implementing a molecular property prediction method in accordance with the present disclosure;
FIG. 3 is a schematic structural view of a molecular property prediction apparatus according to the present disclosure;
Fig. 4 is a block diagram of an electronic device for implementing a molecular property prediction method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In order to more clearly illustrate the molecular characterization model training method provided by the present disclosure, an application scenario of the molecular characterization model training method provided by the present disclosure will be illustrated in the following, where the following example is only one possible application scenario of the molecular characterization model training method provided by the present disclosure, and in other possible embodiments, the molecular characterization model training method provided by the present disclosure may also be applied to other possible embodiments, and the following example does not limit any limitation.
The efficacy of a drug depends on the chemical nature of each molecule in the drug, so that accurate prediction of the chemical nature of the molecule plays a vital role in drug development. In the related art, the chemical properties of the molecules are predicted based on a property prediction model obtained through deep learning training, and the property prediction model can learn the mapping relation between the representation of the molecules and the chemical properties of the molecules in the process of deep learning, so that the representation of the molecules can be mapped into the chemical properties of the molecules.
Typically, a molecule is characterized in terms of a molecular formula, but a property prediction model often cannot identify a molecular formula, but only a feature, and thus it is desirable to characterize a molecule in terms of a feature. In the related art, the features of the molecules are often extracted based on the molecular formula represented by the form of SMILES (a specification for explicitly describing the molecular structure by ASCII character strings), but the molecular formula of the SMILES form can only represent chemical bonds and functional groups in the molecules, and cannot represent spatial structure information of the molecules, so that the features extracted based on the molecular formula of the SMILES form cannot represent spatial structure information of the molecules, that is, the features cannot fully characterize the molecules, which makes it difficult to accurately predict chemical properties of the molecules based on the features in the following.
Based on this, the present disclosure provides a molecular property prediction method, which is applied to any electronic device having a molecular property prediction capability, such as a server, a personal computer, etc., and the molecular property prediction method provided by the present disclosure is shown in fig. 1, and includes:
S101, generating a two-dimensional molecular diagram and a three-dimensional molecular diagram of the target molecule according to the molecular formula of the target molecule.
The two-dimensional molecular diagram and the three-dimensional molecular diagram comprise atoms and edges, the atoms in the two-dimensional molecular diagram and the three-dimensional molecular diagram are used for representing the atoms in the target molecule, the edges in the two-dimensional molecular diagram are used for representing chemical bonds among the atoms in the target molecule, and the edges in the three-dimensional molecular diagram are used for representing the position relationship among the atoms in the target molecule.
S102, for each edge in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating the atomic characteristics of two atoms connected with the edge and the edge characteristics of the edge to obtain new edge characteristics of the edge.
S103, for each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating edge characteristics of each edge connected to the atom to obtain new atomic characteristics of the atom.
S104, determining the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram according to the atomic characteristics of each atom.
S105, determining chemical properties corresponding to the two-dimensional features and the three-dimensional features according to a preset mapping relation between the features and the chemical properties, and taking the chemical properties as the chemical properties of the target molecules.
In this embodiment, the atomic characteristics of the atoms are propagated to the edges on the molecular diagram, and then the atomic characteristics of the edges are propagated to the atoms, so that the atomic characteristics of each atom not only include the characteristics for representing the atoms, but also include the characteristics capable of representing the structure of the molecular diagram, and the edges in the two-dimensional molecular diagram are used for representing the chemical bonds in the target molecule, so that the structure of the two-dimensional molecular diagram can reflect the chemical semantics of the molecule, and the edges in the three-dimensional molecular diagram are used for representing the positional relationship among the atoms in the target molecule, so that the structure of the three-dimensional molecular diagram can reflect the spatial structure information of the molecule, namely, the characteristic propagation, and the atomic characteristics of each atom can reflect the chemical semantics and the spatial structure information of the target molecule to a certain extent, so that the two-dimensional characteristics and the three-dimensional characteristics obtained based on the atomic characteristics of each atom can not only represent the chemical semantics of the target molecule, but also can represent the spatial structure information of the target molecule, namely, the target molecule can be more comprehensively represented. Thus, the chemical property obtained based on the two-dimensional characteristic and the three-dimensional characteristic mapping is more accurate.
The steps of the foregoing S101 to S105 will be described in detail below:
in S101, the molecular formula of the target molecule is represented in any form that can be recognized by the electronic device, such as a form of SMILES.
A two-dimensional molecular graph can be considered as a set of nodes, edges, coordinates, where the number of nodes is equal to the number of atoms in a target molecule, and each node is used to represent one atom in the target molecule, and different nodes represent different atoms.
Each edge in the two-dimensional molecular diagram is used to connect two nodes and to represent a chemical bond between atoms represented by the two nodes, and illustratively, if a first edge in the two-dimensional molecular diagram connects a first node and a second node, the first node represents a first atom, the second node represents a second atom, and a first chemical bond exists between the first atom and the second atom in the target molecule, the first edge is used to represent the first chemical bond.
Each coordinate corresponds to a node, a different coordinate corresponds to a different node, and each coordinate is used to represent a position of an atom represented by the corresponding node in a molecular structure diagram of the target molecule.
It will be appreciated that, for ease of observation, the molecular structure diagram often only shows the topological structure between the atoms in the molecule, but cannot show the spatial positional relationship between the atoms, and for example, it is assumed that a chemical bond exists between a first atom and a second atom in the molecule and the distance is a first distance, a chemical bond exists between a second atom and a third atom and the distance is a second distance, and the first distance is far greater than the second distance, but for ease of observation in the molecular structure diagram, the distance between the first atom and the second atom is often equal to the distance between the second atom and the third atom.
The two-dimensional molecular diagram may be represented in the form of an image or may be represented in other forms than an image, for example, the two-dimensional molecular diagram may be represented in the form of any structure capable of representing a set of nodes, edges, and coordinates.
Similarly to a two-dimensional molecular diagram, a three-dimensional molecular diagram can also be regarded as a set of nodes, edges, coordinates, wherein the number of nodes is equal to the number of atoms in the target molecule, and each node is used to represent one atom in the target molecule, and different nodes represent different atoms.
Each edge is used to connect two nodes and to represent the spatial positional relationship of the atoms represented by the two connected nodes. The number of edges may vary according to the application scenario, and in one possible embodiment, there is an edge between every two nodes in the three-dimensional molecular diagram, and in another possible embodiment, there is only an edge between some nodes and no edge between other nodes in the three-dimensional molecular diagram, for example, there is an edge between only nodes whose represented distance between atoms is less than a preset distance threshold and there is no edge between nodes whose represented distance between atoms is greater than the preset distance threshold.
Each coordinate corresponds to a node, a different coordinate corresponds to a different node, and each coordinate is used to represent the position of the atom represented by the corresponding node in the molecular structure diagram of the target molecule.
The three-dimensional molecular diagram may be represented in the form of an image or may be represented in other forms than an image, for example, the three-dimensional molecular diagram may be represented in the form of any structure capable of representing a set of nodes, edges, and coordinates.
As described above, since the two-dimensional and three-dimensional sub-graphs can be regarded as a set of nodes, edges, and coordinates, hereinafter, for convenience of description, the two-dimensional view is denoted as { v, ∈ 2d,c2d }, the three-dimensional view is denoted as { v, ∈ 3d,c3d }, where v is a set of nodes, ε 2d is a set of edges in the two-dimensional sub-graph, c 2d is a set of coordinates in the two-dimensional sub-graph, ε 3d is a set of edges in the three-dimensional sub-graph, and c 3d is a set of coordinates in the three-dimensional sub-graph.
Because there is a one-to-one correspondence between a node and an atom, the node is referred to herein for convenience of description as an atom represented by the node.
The present disclosure does not limit any way to generate the two-dimensional and three-dimensional sub-images, and exemplarily, an exemplary description will be given below of how to generate the two-dimensional sub-image number two three-dimensional sub-image:
And for the three-dimensional molecular diagram, determining the spatial position of each atom in the target molecule in the three-dimensional space based on the energy approximation principle according to the molecular formula of the target molecule, and connecting every two atoms with the distance smaller than a preset distance threshold according to the spatial position to obtain the three-dimensional molecular diagram of the target molecule.
The energy approximation principle refers to: of the intramolecular symmetry matching atomic orbitals, only atomic orbitals with similar energy can be combined into an effective molecular orbital. The molecular formula of the target molecule can be mapped into the spatial position in the three-dimensional space by using the mapping relation designed based on the energy approximation principle, so that the spatial position of each atom in the target molecule in the three-dimensional space is obtained. The preset distance threshold may be set according to actual requirements and/or user experience.
It will be appreciated that the molecular formula of the target molecule is known, and thus the set v of atoms in the target molecule is known, whereas in the case where it is determined that the spatial position of each atom in the target molecule in three-dimensional space is obtained, the set c 3d of coordinates is known, whereas by connecting atoms having a distance smaller than the preset distance threshold value, a set of sides can be obtained, and since the connecting distance is an atom having a distance smaller than the preset distance threshold value, the sides formed by connecting atoms can represent closer distances between the connected atoms, i.e., the set of sides obtained by connecting atoms can represent the spatial positional relationship as the set of sides ε 3d in the three-dimensional sub-graph, and thus the three-dimensional sub-graph { v, ε 3d,c3d } for representing the positional relationship between each atom in the target molecule can be obtained by connecting atoms having a distance smaller than the preset distance threshold value.
It will be appreciated that, because of the uncertainty in the energy approximation principle, the spatial position of each atom in the molecule in three-dimensional space cannot be uniquely determined based on the energy approximation principle, and therefore, in a possible embodiment, the foregoing determination of the spatial position of each atom in the target molecule in three-dimensional space based on the energy approximation principle is achieved by:
The predicted positions of the atoms in the target molecule in the three-dimensional space are determined for multiple times based on the energy approximation principle, and the predicted positions determined for multiple times are not identical (even completely different) because of certain uncertainty of the energy approximation principle. For each atom in the target molecule, the mean value of each predicted position of the atom is determined as the spatial position of the atom in three-dimensional space.
By way of example, assuming that P predicted positions are determined altogether, and the spatial coordinates of the first atom in the predicted positions obtained by the first determination are c 1, and the spatial coordinates of the first atom in the predicted positions obtained by the second determination are c 2, and so on, the spatial positions of the first atoms are calculated by the formula (1):
Where c is the spatial position of the first atom. By adopting the embodiment, the prediction position can be repeatedly determined based on the energy approximation principle, and the average value of the prediction position is taken as the space position, so that the uncertainty of the energy approximation principle is eliminated to a certain extent, the accuracy of a three-dimensional molecular graph obtained by determination is improved, and the accuracy of a molecular characterization model obtained by subsequent training is improved.
And connecting every two atoms connected through covalent bonds according to the molecular formula of the target molecule for the two-dimensional molecular diagram to obtain the two-dimensional molecular diagram of the target molecule.
It will be appreciated that the molecular formula of the target molecule is known, and thus the set v of atoms in the target molecule is known, and the molecular structure diagram of the target molecule is known, and thus the positions of the atoms in the target molecule in the molecular structure diagram are also known, i.e. the set c 2d of coordinates is known, whereas by joining each two atoms joined by covalent bonds, a set of sides can be obtained, and since the atoms joined by covalent bonds are joined, the sides formed by joining the atoms can represent covalent bonds in the target molecule, i.e. the set of sides obtained by joining the atoms can be taken as a set of sides epsilon 2d in a two-dimensional molecular diagram, and thus by joining the atoms joined by covalent bonds a two-dimensional molecular diagram { v, epsilon 2d,c2d } representing the chemical bonds between the atoms in the target molecule can be obtained.
In S102, the manner of aggregation includes, but is not limited to, splicing, fusing, etc., which is not limited in any way by the present disclosure. Illustratively, in one possible embodiment, for each edge in the two-dimensional molecular graph, the atoms of the two atoms that are connected by the edge are aggregated according to equation (2)
Wherein,For a new edge feature of an edge connecting atoms u and v in a two-dimensional molecular diagram,/>Is the atomic characteristic of an atom u in a two-dimensional molecular diagram,/>Is the atomic characteristic of the atom v in the two-dimensional molecular diagram,/>For the initial edge feature of the edge in the two-dimensional molecular graph connecting atoms u and v, MLP (·) is the message transfer function.
For each edge in the three-dimensional molecular graph, aggregating atoms of two atoms connected by the edge according to formula (3)
Wherein,For new edge features of the edge connecting atoms u and v in the three-dimensional molecular diagram,/>Is the atomic characteristic of an atom u in a three-dimensional molecular diagram,/>R uv is the initial edge feature of the edge connecting atom u and atom v in the two-dimensional molecular diagram, and MLP (·) is the message transfer function. And the message transfer function in the formula (3) and the message transfer function in the formula (2) may be the same message transfer function or may be different message transfer functions.
By aggregating the atomic features of the atoms connected to the edge and the edge features of the edge, the new edge features of the edge can be made to contain the atomic features of the connected atoms, i.e. the propagation of features from atoms to edges is achieved.
In S103, the manner of aggregation includes, but is not limited to, splicing, fusion, etc., which is not limited in any way by the present disclosure. Illustratively, in one possible embodiment, for an atom v in a two-dimensional molecular graph, the edge features connected to each edge of the atom are aggregated according to equation (4):
Wherein, Is a new atomic feature of atom v in a two-dimensional molecular diagram,/>For the edge characteristics of the edges connecting atoms u and v in the two-dimensional molecular diagram, D (a v) is the set of all the edges connected to atom v in the two-dimensional molecular diagram,Is a preset weight matrix. l uv is a geometric characterization vector used to represent the distance between atoms u and v in a two-dimensional molecular diagram. As an example, assuming vector a is (a 1, a2, a 3) and vector B is (B1, B2, B3), a ∈b= (a 1×b1, a2×b2, a 3) are given.
For an atom v in a three-dimensional molecular diagram, aggregating edge features attached to each edge of the atom according to equation (5):
Wherein, Is a new atomic feature of atom v in a two-dimensional molecular diagram,/>For the edge characteristics of the edges connecting atoms u and v in the three-dimensional molecular diagram, D (a v) is the set of all the edges connected to atom v in the three-dimensional molecular diagram,Is a preset weight matrix. r uv is a geometric characterization vector used to represent the distance between atoms u and v in a three-dimensional molecular diagram.
By aggregating edge features of edges connected to the same atom, the new atomic feature of the atom can be made to include edge features of edges connected to the atom, i.e., edge-to-atom propagation features are achieved.
Also, S102 and S103 may be performed only once or may be repeated a plurality of times so that the feature is sufficiently propagated in the molecular diagram, for example, S102 is performed back until the preset number of cycles after S103 is performed, and S104 is performed.
In S104, as described above, since the feature information propagates from the atom to the edge in S102 and the feature information propagates from the edge to the atom in S103, the feature information propagates along the topology structure of the molecular diagram, so that the atomic features of each atom in the molecular diagram include not only the features for characterizing the atom but also features capable of characterizing the topology structure of the molecular diagram, and as described above, the topology structure of the two-dimensional molecular diagram can characterize the chemical semantics of the target molecule, and the topology structure of the three-dimensional molecular diagram can characterize the spatial structure information of the target molecule, so that the two-dimensional features and the three-dimensional features extracted from the atomic features of each atom in the molecular diagram can respectively characterize the chemical semantics and the spatial structure information of the target molecule, i.e., the obtained two-dimensional features and three-dimensional features can more comprehensively characterize the target molecule.
In S105, the mapping relationship may refer to a mapping relationship between the two-dimensional feature, the three-dimensional feature, and the chemical property, or may refer to a mapping relationship between the fusion feature and the chemical property calculated based on the two-dimensional feature and the three-dimensional feature.
If the mapping relation is the mapping relation among the two-dimensional characteristics, the three-dimensional characteristics and the chemical properties, the mapping relation is utilized to directly map the two-dimensional characteristics and the three-dimensional characteristics of the molecules to be predicted, and the chemical properties of the target molecules are obtained.
If the mapping relationship refers to the mapping relationship between the fusion feature and the chemical property, the fusion feature is calculated according to the two-dimensional feature and the three-dimensional feature of the molecule to be predicted, the calculation method includes but is not limited to splicing, weighted addition and the like, and the mapping relationship is utilized to map the fusion feature, so that the chemical property of the target molecule is obtained.
As in the foregoing analysis, by the feature information propagation from atom to edge and from edge to atom, it is possible to implement an atomic feature that causes the topology information of the score graph to be propagated to each atom, and in one possible embodiment, to further accelerate the transfer of the topology information, between the foregoing S102 and S103, may further include:
And S106, updating the edge characteristics of the edges according to the included angles between the edges and the adjacent edges of the edges aiming at each edge in the two-dimensional sub-graph and the three-dimensional sub-graph.
It can be understood that the included angle between the edges and the adjacent edges of the edges can reflect the topology structure of the molecular graph to a certain extent, so that the updating of the edge characteristics of the edges according to the included angle between the edges and the adjacent edges of the edges can be regarded as the transmission of the topology structure information from edge to edge.
By adopting the embodiment, the topological structure information can be transmitted between the edges before the topological structure information is transmitted from the edges to the atoms, so that the topological structure information is fully transmitted, and richer topological structure information can be transmitted when the follow-up edge is transmitted to the atoms, so that the atomic characteristics of the atoms comprise richer characteristics capable of representing the topological structure, and further, the two-dimensional characteristics and the three-dimensional characteristics which are obtained by follow-up determination according to the atomic characteristics of the atoms can more accurately represent the chemical semantics and the spatial structure of the target molecules, namely the target molecules are more accurately represented, and the accuracy of the chemical properties of the predicted target molecules is further improved.
In one possible embodiment, for the edge connecting atom u and atom v in the two-dimensional molecular graph, the edge feature is updated according to equation (6):
Wherein, A (e uv) represents a set of sides adjacent to the uv side in the two-dimensional molecular diagram, and φ wuv represents an included angle between the uv side and the side for connecting the atoms u and w (hereinafter referred to as the uv side), and/(hereinafter referred to as the uv side)And/>Is a preset weight matrix.
The edge characteristics of the uv edge in the three-dimensional molecular diagram can also be updated according to the formula (6), and the difference is only that the edge characteristics of the uv edge in the two-dimensional molecular diagram are replaced by the edge characteristics of the uv edge in the three-dimensional molecular diagram.
It will be appreciated that edges in a three-dimensional molecular diagram are used to represent the spatial relationship between atoms in a target molecule, edges in a two-dimensional molecular diagram are used to represent chemical bonds between atoms in a target molecule, and that there may be some spatial relationship between two atoms in a target molecule even if there is no chemical bond, e.g., the spatial distance between the two atoms is relatively close, but no chemical bond has yet been formed between the two atoms. Therefore, the number of edges in the three-dimensional sub-graph is often larger than that of edges in the two-dimensional sub-graph, so that the number of adjacent edges of each edge in the three-dimensional sub-graph is larger, the included angles required for reference when updating the characteristics of the edges are excessive, and the updating efficiency is lower.
Based on this, in one possible embodiment, the edge characteristics of the edges are updated for each edge in the three-dimensional molecular graph in the aforementioned S105 by:
For each angle domain, determining the three-dimensional representation of the angle domain according to the included angle between the edge and other edges in the angle domain, wherein the three-dimensional representation is positively correlated with the included angle. And pooling the three-dimensional representation of each angle domain to obtain new edge characteristics of the edge.
For each edge, the other changes are divided into a plurality of angle domains according to the value interval of the space angle between the other edge and the edge, each other edge belongs to only one angle domain, and one or more other edges can be included in the angle domain or not included in the angle domain. The division manner of the angle domains can be different according to different actual requirements, but other sides should be uniformly distributed in each angle domain as far as possible.
By selecting the embodiment, the topology information of each other side in the angle domain can be aggregated firstly, and then the side characteristics of the side are updated according to the aggregated topology information, and as the number of the other sides included in each angle domain is smaller than the total number of the other sides, the excessive included angles required for reference during updating the side characteristics are avoided, and the efficiency of updating the side characteristics is effectively improved, so that the efficiency of molecular characterization is improved, namely the efficiency of predicting molecular properties is improved.
In one possible embodiment, the three-dimensional representation for the i-th angular domain is calculated according to equation (7):
Wherein, For three-dimensional characterization of the ith angular domain,/>Is an edge feature of uv edge in the three-dimensional molecular graph,For a preset weight coefficient, θ wuv is the included angle between the uw edge and the uv edge, and Ai (e uv) is the set of adjacent edges between the ith angle domain and the uv edge.
In this embodiment, the three-dimensional representation of each angular domain is pooled by equation (8):
Wherein, For the new edge feature of uv edge, n is the number of angle domains and Pool (·) is the max pooling function.
In one possible embodiment, the foregoing S104 is implemented by:
And carrying out attention pooling on the atomic characteristics of each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram respectively to obtain the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram.
The attention pooling refers to pooling operation based on an attention mechanism, that is, different weights are set for different atoms in the process of the pooling operation, so that the atomic characteristics of the different atoms have different influence degrees on the pooling result.
With this embodiment, the atomic features of more important atoms can be assigned a relatively higher weight and the atomic features of less important atoms can be assigned a relatively lower weight by the attention mechanism, thereby enabling the resulting two-dimensional and three-dimensional features to more significantly characterize these more important atoms. It will be appreciated that although each atom in each target molecule affects the chemical nature of the target molecule to some extent, the extent to which different atoms affect the chemical nature is different, and by way of example, an atom belonging to a hydroxyl group has a greater effect on the chemical nature of the target molecule than an atom not belonging to a hydroxyl group, provided that the chemical nature of the target molecule is predominantly determined by the hydroxyl groups in the target molecule. Therefore, the more obvious the atomic characteristic characterization of the relatively important atoms in the molecular characterization is, the molecular characterization can more accurately reflect the chemical properties of the target molecules, namely, the accuracy of the obtained two-dimensional characteristics and three-dimensional characteristics can be further improved by selecting the embodiment, so that the accuracy of the predicted chemical properties is further improved.
The manner in which attention is pooled is exemplified below, and in one possible embodiment attention pooling is performed according to equation (9):
ht+1=GRU(ht,gt)…(9)
Wherein T is an integer with a value range of [0, T g ], T g is the number of times of pooling in the process of attention pooling, and GRU (DEG) is a function of a door cycle unit (Gate Recurrent Unit). H 0 is the sum of the latest atomic characteristics of each atom in the two-dimensional molecular diagram when determining the two-dimensional characteristics, and h 0 is the sum of the latest atomic characteristics of each atom in the three-dimensional molecular diagram when determining the three-dimensional characteristics.
G t is calculated by the formula (10):
Wherein, a v is the latest atomic feature of atom V in the two-dimensional molecular diagram when determining the two-dimensional feature, a v is the latest atomic feature of atom V in the three-dimensional molecular diagram when determining the three-dimensional feature, and V is the atomic set. Softmax (·) is a logical recursive function. Is a preset weight matrix.
Alpha v is calculated by formula (11):
αv=LeakyRelu(qT|ht||av|)…(11)
Wherein LeakyRelu (·) is an activation function, and q T is a preset weight matrix.
The molecular property prediction method provided by the disclosure can be realized through a neural network obtained based on deep learning training, and can also be realized through an algorithm model generated based on traditional machine learning. For a clearer explanation of the molecular property prediction method provided by the present disclosure, an exemplary description will be made below of a case implemented by a neural network, as shown in fig. 2, fig. 2 shows a schematic structural diagram of a message passing network (MESSAGE PASSING Neural Networks, MPNN) provided by the present disclosure for implementing the steps of S102-S104 and S106 described above.
The messaging network includes a plurality of messaging layers and a pooling module, each messaging layer including: two-dimensional atomic-edge propagation module, two-dimensional edge-edge atomic propagation module, three-dimensional atomic-edge propagation module, three-dimensional edge-edge atomic propagation module.
The input of the two-dimensional atom to the edge propagation module comprises an atom set, an edge set, the lengths of all edges and included angles among all edges in the two-dimensional molecular graph, the first message transfer layer is removed, and the input of the two-dimensional atom in other message transfer layers to the edge propagation module also comprises the output of the two-dimensional edge of the last message transfer layer to the atom propagation module. The two-dimensional atomic edge propagation module is used for realizing that the atomic features of two atoms connected by the edge and the edge features of the edge are aggregated for each edge in the two-dimensional molecular graph in the step S102, so as to obtain new edge features of the edge. And inputting the new edge characteristics of the edge to the two-dimensional edge-to-edge propagation module.
The input of the two-dimensional edge-to-edge propagation module comprises an atom set, an edge set, the lengths of all edges, included angles among all edges in the two-dimensional molecular graph, and new edge characteristics of all edges output by the two-dimensional atom-to-edge propagation module. The two-dimensional edge-to-edge propagation module is configured to update edge characteristics of edges according to an included angle between edges and adjacent edges of the edges for each edge in the two-dimensional score graph in S106. And inputting the edge characteristics updated by each edge to a two-dimensional edge-to-atom propagation module.
The input of the two-dimensional edge to the atom propagation module comprises an atom set, an edge set, the lengths of all edges, included angles among all edges in the two-dimensional molecular graph, and edge characteristics of all edges which are output by the two-dimensional edge to the edge propagation module after being updated. And the input of the two-dimensional edge to the atomic propagation module in other message passing layers except the first message passing layer also comprises the output of the two-dimensional edge to the atomic propagation module of the last message passing layer. The two-dimensional edge-to-atom propagation module is used for realizing that the edge characteristics of each edge connected to each atom in the two-dimensional molecular graph are aggregated for each atom in the S103, so as to obtain and output new atomic characteristics of the atoms.
The input of the three-dimensional atom to the edge propagation module comprises an atom set in a three-dimensional molecular graph, the length of each edge and the included angle between each edge, and besides the first message transmission layer, the input of the three-dimensional atom in other message transmission layers to the edge propagation module also comprises the output of the three-dimensional edge to the atom propagation module of the last message transmission layer. The three-dimensional atomic edge propagation module is used for realizing that for each edge in the three-dimensional molecular graph in the step S102, the atomic characteristics of two atoms connected by the edge and the edge characteristics of the edge are aggregated to obtain new edge characteristics of the edge. And inputting the new edge characteristics of the edge into the three-dimensional edge-to-edge propagation module.
The input of the three-dimensional edge-to-edge propagation module comprises an atom set in the three-dimensional molecular graph, the lengths of all edges, included angles among all edges and new edge characteristics of all edges output by the three-dimensional atom-to-edge propagation module. The three-dimensional edge-to-edge propagation module is configured to update edge features of edges according to an included angle between edges and adjacent edges of the edges for each edge in the three-dimensional score graph in S105. And inputting the edge characteristics updated by each edge to a three-dimensional edge atomic propagation module.
The input of the three-dimensional edge-to-atom propagation module comprises an atom set in a three-dimensional molecular graph, the lengths of all edges, included angles among all edges and edge characteristics of all edges updated output by the three-dimensional edge-to-edge propagation module. And the input of the three-dimensional edge-to-atom-propagation module in the other messaging layers, except for the first messaging layer, also includes the output of the three-dimensional edge-to-atom-propagation module of the last messaging layer. The three-dimensional edge-to-atom propagation module is used for realizing that the edge characteristics of each edge connected to each atom in the three-dimensional molecular graph are aggregated for each atom in the S103, so as to obtain and output new atomic characteristics of the atoms.
The input of the pooling module is the output of the two-dimensional edge-to-node propagation module and the three-dimensional edge-to-node propagation module in the last message transfer layer. The pooling module is used for implementing the step S104, and the output of the pooling module is the two-dimensional feature number two three-dimensional feature.
It can be understood that in the message passing network, the outputs of the two-dimensional atomic-direction edge propagation module, the two-dimensional edge propagation module and the two-dimensional edge-direction atomic propagation module are two-dimensional feature vectors, and the outputs of the three-dimensional atomic-direction edge propagation module, the three-dimensional edge-direction edge propagation module and the three-dimensional edge-direction atomic propagation module are three-dimensional feature vectors.
In the following, it will be explained how the messaging network shown in fig. 2 can be trained, in one possible embodiment, by means of supervised learning, using sample molecules labeled with two-dimensional features and three-dimensional features, and in another possible embodiment, by means of contrast learning, for example, two-dimensional and three-dimensional molecular figures of each sample molecule are input into the messaging network, so as to obtain two-dimensional and three-dimensional features of each sample molecule output by the messaging network, and network parameters of the messaging network, such as weight matrices in the foregoing formulas, are adjusted according to differences between each two-dimensional and three-dimensional features.
And, can also train the message transmission network according to actual demand after training the message transmission network, because first training is carried out in advance before the second training, so first training is called pretraining.
For the second training, the two-dimensional molecular diagram and the three-dimensional molecular diagram of the sample molecule marked with the chemical property are input into a pre-trained message transmission network to obtain two-dimensional characteristics and three-dimensional characteristics, the chemical property of the sample molecule is predicted according to the two-dimensional characteristics and the three-dimensional characteristics, and the model parameters of the message transmission network are adjusted according to the difference between the predicted chemical property and the chemical property marked by the sample molecule.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a molecular property prediction apparatus provided in the present disclosure, including:
A molecular diagram generating module 301, configured to generate a two-dimensional molecular diagram and a three-dimensional molecular diagram of a target molecule according to a molecular formula of the target molecule, where the two-dimensional molecular diagram and the three-dimensional molecular diagram include atoms and edges, the atoms in the two-dimensional molecular diagram and the three-dimensional molecular diagram are used to represent atoms in the target molecule, the edges in the two-dimensional molecular diagram are used to represent chemical bonds between atoms in the target molecule, and the edges in the three-dimensional molecular diagram are used to represent a positional relationship between atoms in the target molecule;
An atomic propagation module 302, configured to aggregate, for each edge in the two-dimensional molecular graph and the three-dimensional molecular graph, an atomic feature of two atoms connected to the edge and an edge feature of the edge, to obtain a new edge feature of the edge;
An edge-to-atom propagation module 303, configured to aggregate, for each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram, edge features connected to respective edges of the atom, to obtain a new atom feature of the atom;
The feature determining module 304 is configured to determine a two-dimensional feature of the two-dimensional molecular diagram and a three-dimensional feature of the three-dimensional molecular diagram according to an atomic feature of each atom.
And the property prediction module 305 is configured to determine, according to a mapping relationship between a preset feature and a chemical property, the chemical property corresponding to the two-dimensional feature and the three-dimensional feature as the chemical property of the target molecule.
In one possible embodiment, the method further comprises:
and the edge-to-edge propagation module is used for updating edge characteristics of the edges according to the included angle between the edges and the adjacent edges of the edges aiming at each edge in the two-dimensional molecular graph and the three-dimensional molecular graph.
In one possible embodiment, the edge-to-edge propagation module updates an edge feature of the edge according to an angle between the edge and an adjacent edge of the edge, including:
For each angle domain, determining a three-dimensional representation of the angle domain according to the included angle between the edge and other edges in the angle domain, wherein the three-dimensional representation is positively correlated with the included angle;
Pooling the three-dimensional representation of each of the angular domains to obtain new edge features of the edge.
In one possible embodiment, the feature determining module 304 determines a two-dimensional feature of the two-dimensional molecular graph and a three-dimensional feature of the three-dimensional molecular graph according to an atomic feature of each atom, including:
And carrying out attention pooling on the atomic characteristics of each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram respectively to obtain the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, such as the molecular property prediction method. For example, in some embodiments, the molecular property prediction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the molecular property prediction method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the molecular property prediction method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (11)

1. A method of predicting molecular properties, comprising:
Generating a two-dimensional molecular diagram and a three-dimensional molecular diagram of the target molecule according to the molecular formula of the target molecule, wherein the two-dimensional molecular diagram and the three-dimensional molecular diagram comprise atoms and edges, the atoms in the two-dimensional molecular diagram and the three-dimensional molecular diagram are used for representing the atoms in the target molecule, the edges in the two-dimensional molecular diagram are used for representing chemical bonds among the atoms in the target molecule, and the edges in the three-dimensional molecular diagram are used for representing the position relationship among the atoms in the target molecule;
for each edge in the two-dimensional molecular diagram and the three-dimensional molecular diagram, aggregating the atomic characteristics of two atoms connected with the edge and the edge characteristics of the edge based on a message transfer function to obtain new edge characteristics of the edge;
For each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram, acquiring new atomic characteristics of the atom by aggregating new edge characteristics of each edge connected to the atom through weighted summation;
determining the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram according to the atomic characteristics of each atom;
and determining the chemical properties corresponding to the two-dimensional features and the three-dimensional features according to the mapping relation between the preset features and the chemical properties, and taking the chemical properties as the chemical properties of the target molecules.
2. The method of claim 1, further comprising:
and updating the edge characteristics of the edges according to the included angles between the edges and the adjacent edges of the edges aiming at each edge in the two-dimensional sub-graph and the three-dimensional sub-graph.
3. The method of claim 2, wherein the updating the edge feature of the edge based on the angle between the edge and the adjacent edge of the edge comprises:
For each angle domain, determining a three-dimensional representation of the angle domain according to the included angle between the edge and other edges in the angle domain, wherein the three-dimensional representation is positively correlated with the included angle;
Pooling the three-dimensional representation of each of the angular domains to obtain new edge features of the edge.
4. The method of claim 1, wherein the determining the two-dimensional features of the two-dimensional molecular map and the three-dimensional features of the three-dimensional molecular map from the atomic features of the atoms comprises:
And carrying out attention pooling on the atomic characteristics of each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram respectively to obtain the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram.
5. A molecular property prediction apparatus comprising:
A molecule diagram generation module, configured to generate a two-dimensional molecule diagram and a three-dimensional molecule diagram of a target molecule according to a molecular formula of the target molecule, where the two-dimensional molecule diagram and the three-dimensional molecule diagram include atoms and edges, the atoms in the two-dimensional molecule diagram and the three-dimensional molecule diagram are used for representing atoms in the target molecule, the edges in the two-dimensional molecule diagram are used for representing chemical bonds between atoms in the target molecule, and the edges in the three-dimensional molecule diagram are used for representing a positional relationship between atoms in the target molecule;
an atomic propagation module, configured to aggregate, for each edge in the two-dimensional molecular graph and the three-dimensional molecular graph, an atomic feature of two atoms connected to the edge and an edge feature of the edge based on a message transfer function, to obtain a new edge feature of the edge;
The edge-to-atom propagation module is used for acquiring new atomic characteristics of the atoms by aggregating new edge characteristics of each edge connected to the atoms through weighted summation for each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram;
the characteristic determining module is used for determining the two-dimensional characteristic of the two-dimensional molecular diagram and the three-dimensional characteristic of the three-dimensional molecular diagram according to the atomic characteristic of each atom;
And the property prediction module is used for determining the chemical properties corresponding to the two-dimensional characteristics and the three-dimensional characteristics as the chemical properties of the target molecules according to the mapping relation between the preset characteristics and the chemical properties.
6. The apparatus of claim 5, further comprising:
and the edge-to-edge propagation module is used for updating edge characteristics of the edges according to the included angle between the edges and the adjacent edges of the edges aiming at each edge in the two-dimensional molecular graph and the three-dimensional molecular graph.
7. The apparatus of claim 6, wherein the edge-to-edge propagation module updates edge characteristics of the edge based on an angle between the edge and an adjacent edge of the edge, comprising:
For each angle domain, determining a three-dimensional representation of the angle domain according to the included angle between the edge and other edges in the angle domain, wherein the three-dimensional representation is positively correlated with the included angle;
Pooling the three-dimensional representation of each of the angular domains to obtain new edge features of the edge.
8. The apparatus of claim 5, wherein the feature determination module determines a two-dimensional feature of the two-dimensional molecular map and a three-dimensional feature of the three-dimensional molecular map from atomic features of each atom, comprising:
And carrying out attention pooling on the atomic characteristics of each atom in the two-dimensional molecular diagram and the three-dimensional molecular diagram respectively to obtain the two-dimensional characteristics of the two-dimensional molecular diagram and the three-dimensional characteristics of the three-dimensional molecular diagram.
9. An electronic device, comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.
11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-4.
CN202210165349.5A 2022-02-17 2022-02-17 Molecular property prediction method and device and electronic equipment Active CN114446413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210165349.5A CN114446413B (en) 2022-02-17 2022-02-17 Molecular property prediction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210165349.5A CN114446413B (en) 2022-02-17 2022-02-17 Molecular property prediction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN114446413A CN114446413A (en) 2022-05-06
CN114446413B true CN114446413B (en) 2024-05-28

Family

ID=81373997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210165349.5A Active CN114446413B (en) 2022-02-17 2022-02-17 Molecular property prediction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114446413B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0106441D0 (en) * 2001-03-15 2001-05-02 Bayer Ag Method for generating a hierarchical topological tree of 2D or 3D-structural formulas of chemical compounds for property optimization of chemical compounds
CN110767271A (en) * 2019-10-15 2020-02-07 腾讯科技(深圳)有限公司 Compound property prediction method, device, computer device and readable storage medium
CN112185477A (en) * 2020-09-25 2021-01-05 北京望石智慧科技有限公司 Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship
CN113140267A (en) * 2021-03-25 2021-07-20 北京化工大学 Directional molecule generation method based on graph neural network
CN113241130A (en) * 2021-06-08 2021-08-10 西南交通大学 Molecular structure prediction method based on graph convolution network
CN113241126A (en) * 2021-05-18 2021-08-10 百度时代网络技术(北京)有限公司 Method and apparatus for training predictive models for determining molecular binding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10622098B2 (en) * 2017-09-12 2020-04-14 Massachusetts Institute Of Technology Systems and methods for predicting chemical reactions

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0106441D0 (en) * 2001-03-15 2001-05-02 Bayer Ag Method for generating a hierarchical topological tree of 2D or 3D-structural formulas of chemical compounds for property optimization of chemical compounds
CN110767271A (en) * 2019-10-15 2020-02-07 腾讯科技(深圳)有限公司 Compound property prediction method, device, computer device and readable storage medium
CN112185477A (en) * 2020-09-25 2021-01-05 北京望石智慧科技有限公司 Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship
CN113140267A (en) * 2021-03-25 2021-07-20 北京化工大学 Directional molecule generation method based on graph neural network
CN113241126A (en) * 2021-05-18 2021-08-10 百度时代网络技术(北京)有限公司 Method and apparatus for training predictive models for determining molecular binding
CN113241130A (en) * 2021-06-08 2021-08-10 西南交通大学 Molecular structure prediction method based on graph convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
三维分子模型VAX图形工作站;张福贵;朱敏慧;刘丽;范依文;;计算机与应用化学;19890731;第6卷(第03期);20-26 *

Also Published As

Publication number Publication date
CN114446413A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
JP7354320B2 (en) Quantum device noise removal method and apparatus, electronic equipment, computer readable storage medium, and computer program
CN114580647B (en) Quantum system simulation method, computing device, device and storage medium
CN112819971B (en) Method, device, equipment and medium for generating virtual image
CN113313261B (en) Function processing method and device and electronic equipment
CN114357105B (en) Pre-training method and model fine-tuning method of geographic pre-training model
CN113568860B (en) Deep learning-based multi-machine cluster topology mapping method and device and program product
US20220130495A1 (en) Method and Device for Determining Correlation Between Drug and Target, and Electronic Device
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN113190719B (en) Node grouping method and device and electronic equipment
CN114139712B (en) Quantum circuit processing method, quantum circuit processing device, electronic device and storage medium
CN114398834B (en) Training method of particle swarm optimization algorithm model, particle swarm optimization method and device
CN115147680B (en) Pre-training method, device and equipment for target detection model
KR20220042315A (en) Method and apparatus for predicting traffic data and electronic device
CN113098624B (en) Quantum state measurement method, device, equipment, storage medium and system
CN114446413B (en) Molecular property prediction method and device and electronic equipment
CN116524165B (en) Migration method, migration device, migration equipment and migration storage medium for three-dimensional expression model
CN114566232A (en) Molecular characterization model training method and device and electronic equipment
CN115458040B (en) Method and device for producing protein, electronic device, and storage medium
Nasir et al. Fast trust computation in online social networks
CN116168770A (en) Molecular data processing method, device electronic equipment and storage medium
CN115687764A (en) Training method of vehicle track evaluation model, and vehicle track evaluation method and device
CN113361719A (en) Incremental learning method based on image processing model and image processing method
CN116108926B (en) Quantum computing method, device, equipment and storage medium
CN115131453B (en) Color filling model training, color filling method and device and electronic equipment
CN116432766B (en) Method, device, equipment and storage medium for simulating non-local quantum operation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant