CN116306952A - Molecular property prediction method and device, storage medium and electronic device - Google Patents
Molecular property prediction method and device, storage medium and electronic device Download PDFInfo
- Publication number
- CN116306952A CN116306952A CN202310250808.4A CN202310250808A CN116306952A CN 116306952 A CN116306952 A CN 116306952A CN 202310250808 A CN202310250808 A CN 202310250808A CN 116306952 A CN116306952 A CN 116306952A
- Authority
- CN
- China
- Prior art keywords
- feature vector
- node
- graph data
- correlation coefficient
- predicted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 239000013598 vector Substances 0.000 claims abstract description 126
- 238000013507 mapping Methods 0.000 claims abstract description 18
- 239000002096 quantum dot Substances 0.000 claims description 29
- 238000004364 calculation method Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000007876 drug discovery Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 239000003054 catalyst Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008571 general function Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000005610 quantum mechanics Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001988 toxicity Effects 0.000 description 1
- 231100000419 toxicity Toxicity 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N10/00—Quantum computing, i.e. information processing based on quantum-mechanical phenomena
- G06N10/20—Models of quantum computing, e.g. quantum circuits or universal quantum computers
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C10/00—Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/20—Identification of molecular entities, parts thereof or of chemical compositions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Pure & Applied Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Condensed Matter Physics & Semiconductors (AREA)
- Computational Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a molecular property prediction method, a device, a storage medium and an electronic device, and relates to the technical field of quantum computing, wherein the method comprises the following steps: obtaining graph data of molecules to be predicted, wherein a feature vector of each node of the graph data represents one atom of the molecules to be predicted; for each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector; mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data; and determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient. The accuracy of the molecular property prediction can be improved.
Description
Technical Field
The application belongs to the technical field of quantum computing, and particularly relates to a molecular property prediction method, a device, a storage medium and an electronic device.
Background
The quantum computer is a kind of physical device which performs high-speed mathematical and logical operation, stores and processes quantum information according to the law of quantum mechanics. When a device processes and calculates quantum information and operates on a quantum algorithm, the device is a quantum computer. Quantum computers are a key technology under investigation because of their ability to handle mathematical problems more efficiently than ordinary computers, for example, to accelerate the time to crack RSA keys from hundreds of years to hours.
Molecular property prediction plays an important role in many fields, such as drug discovery, chemical reaction, catalyst generation, and the like. Particularly in the field of drug discovery, achieving accurate predictions of molecular properties may accelerate the overall process of discovering candidate drugs.
Currently, molecules are usually expressed as Graph data (Graph), and the Graph data are processed based on Graph neural networks (Graph Neural Networks, GNNs), but Graph convolution neural networks (Graph convolutional neural networks, GCN) based on convolution mechanisms have limitations in processing the Graph data, so that complex molecular Graph data processing is difficult, and the accuracy of molecular property prediction results is low.
Disclosure of Invention
The purpose of the application is to provide a molecular property prediction method, a device, a storage medium and an electronic device, which aim to improve the accuracy of a molecular property prediction result.
To achieve the above object, according to a first aspect of embodiments of the present application, there is provided a molecular property prediction method, including:
obtaining graph data of a molecule to be predicted, wherein a feature vector of each node of the graph data represents one atom of the molecule to be predicted;
for each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector;
mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
Optionally, the variable component sub-circuit includes a coding layer, a parameter-containing layer and a measuring layer, and the mapping is performed on each high-dimensional feature vector through the variable component sub-circuit to obtain a correlation coefficient between adjacent nodes in the graph data, including:
encoding each high-dimensional feature vector to a preset number of qubits by using the encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing hierarchy;
and measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating the expected value of a measuring result to obtain the correlation coefficient.
Optionally, the encoding layer includes an H gate and a first RY gate, and the encoding each high-dimensional feature vector to a preset number of qubits using the encoding layer includes:
and applying an H gate to a preset number of qubits, so that each qubit evolves from an initial state to a superposition state, and applying a first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded onto the preset number of qubits in the superposition state.
Optionally, the parameter-containing layer includes an RX gate, a CNOT gate, and a second RY gate.
Optionally, the determining the molecular property prediction result of the molecule to be predicted based on the correlation coefficient includes:
performing graph attention calculation according to the correlation coefficient, and performing graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result of the molecules with the prediction.
Optionally, the performing graph attention calculation according to the correlation coefficient, performing a graph pooling operation on a calculation result to obtain a graph data representation of the molecule to be predicted, including:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through the attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
Optionally, the correlation coefficient is normalized by the following formula:
wherein,,representing the attention factor, +.>Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>Representing an activation function.
Optionally, the feature vector of each node of the graph data is updated by the following formula:
wherein, the double vertical lines represent the splice,representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->Representing the feature vector before the update of the jth node,>representing the activation function, k representing the individual attention coefficient matricesA number.
In a second aspect of embodiments of the present application, there is provided a molecular property prediction apparatus, the apparatus comprising:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring graph data of molecules to be predicted, and a feature vector of each node of the graph data represents one atom of the molecules to be predicted;
the splicing module is used for carrying out characteristic enhancement on the characteristic vector of each node and the characteristic vector of the adjacent node aiming at each node in the graph data and splicing the characteristic vector of the node and the characteristic vector of the adjacent node to obtain a high-dimensional characteristic vector;
the mapping module is used for mapping each high-dimensional feature vector through the variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and the determining module is used for determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
Optionally, the variable component sub-circuit includes a coding layer, a parameter-containing layer, and a measurement layer, and the mapping module is specifically configured to:
encoding each high-dimensional feature vector to a preset number of qubits by using the encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing hierarchy;
and measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating the expected value of a measuring result to obtain the correlation coefficient.
Optionally, the coding layer includes an H gate and a first RY gate, and the mapping module is specifically configured to:
and applying an H gate to a preset number of qubits, so that each qubit evolves from an initial state to a superposition state, and applying a first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded onto the preset number of qubits in the superposition state.
Optionally, the parameter-containing layer includes an RX gate, a CNOT gate, and a second RY gate.
Optionally, the determining module is specifically configured to:
performing graph attention calculation according to the correlation coefficient, and performing graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result of the molecules with the prediction.
Optionally, the determining module is specifically configured to:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through the attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
Optionally, the correlation coefficient is normalized by the following formula:
wherein,,representing the attention factor, +.>Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>Representing an activation function.
Optionally, the feature vector of each node of the graph data is updated by the following formula:
wherein, the double vertical lines represent the splice,representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->Representing the feature vector before the update of the jth node,>representing the activation function, K represents the number of independent attention coefficient matrices.
In a third aspect of embodiments of the present application, there is provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of the method of any of the first aspects described above when run.
In a fourth aspect of embodiments of the present application, there is provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of the method according to any of the first aspects above.
By adopting the technical scheme, the characteristic vector of each node in the graph data and the characteristic vector of the adjacent node are subjected to characteristic enhancement and spliced by acquiring the graph data of the molecules to be predicted, so as to obtain a high-dimensional characteristic vector; mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data; according to the method, a molecular property prediction result of a molecule to be predicted is determined according to the correlation coefficient, a high-dimensional feature vector between adjacent nodes is mapped through a variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, and the complex graph data can be accurately subjected to feature calculation by utilizing the parallel calculation characteristic and entanglement characteristic of the variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, so that the molecular property of the molecule to be predicted can be predicted according to the correlation coefficient, and the accuracy of the prediction result is improved.
Drawings
FIG. 1 is a block diagram of the hardware architecture of a computer terminal showing a molecular property prediction method according to an exemplary embodiment;
FIG. 2 is a flow chart illustrating a method of molecular property prediction according to an exemplary embodiment;
FIG. 3 is a flow chart illustrating another molecular property prediction method according to an exemplary embodiment;
FIG. 4 is a schematic diagram of a variable component sub-circuit shown according to an example embodiment;
fig. 5 is a block diagram illustrating a molecular property prediction apparatus according to an exemplary embodiment.
Detailed Description
The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.
The embodiment of the application firstly provides a molecular property prediction method which can be applied to electronic equipment such as computer terminals, in particular to common computers, quantum computers and the like.
The following describes the operation of the computer terminal in detail by taking it as an example. Fig. 1 is a block diagram of a hardware structure of a computer terminal showing a molecular property prediction method according to an exemplary embodiment. As shown in fig. 1, the computer terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing quantum-wire-based molecular property prediction methods, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the computer terminal described above. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the molecular property prediction method in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 104 to perform various functional applications and data processing, i.e., implement the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 106 includes a network adapter (NetworkInterface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a radio frequency (RadioFrequency, RF) module for communicating wirelessly with the internet.
It should be noted that a real quantum computer is a hybrid structure, which includes two major parts: part of the computers are classical computers and are responsible for performing classical computation and control; the other part is quantum equipment, which is responsible for running quantum programs so as to realize quantum computation. The quantum program is a series of instruction sequences written by a quantum language such as the qlunes language and capable of running on a quantum computer, so that the support of quantum logic gate operation is realized, and finally, quantum computing is realized. Specifically, the quantum program is a series of instruction sequences for operating the quantum logic gate according to a certain time sequence.
In practical applications, quantum computing simulations are often required to verify quantum algorithms, quantum applications, etc., due to the development of quantum device hardware. Quantum computing simulation is a process of realizing simulated operation of a quantum program corresponding to a specific problem by means of a virtual architecture (namely a quantum virtual machine) built by resources of a common computer. In general, it is necessary to construct a quantum program corresponding to a specific problem. The quantum program referred to in the embodiments of the present application is a program written in a classical language to characterize a qubit and its evolution, where the qubit, a quantum logic gate, etc. related to quantum computation are all represented by corresponding classical codes.
Quantum circuits, which are one embodiment of quantum programs and weigh sub-logic circuits as well, are the most commonly used general quantum computing models, representing circuits that operate on qubits under an abstract concept, and their composition includes qubits, circuits (timelines), and various quantum logic gates, and finally the result often needs to be read out through quantum measurement operations.
Unlike conventional circuits, which are connected by metal lines to carry voltage or current signals, in a quantum circuit, the circuit can be seen as being connected by time, i.e., the state of the qubit naturally evolves over time, as indicated by the hamiltonian operator, during which it is operated until a logic gate is encountered.
A quantum program is generally corresponding to a total quantum circuit, where the quantum program refers to the total quantum circuit, and the total number of qubits in the total quantum circuit is the same as the total number of qubits in the quantum program. It can be understood that: one quantum program may consist of a quantum circuit, a measurement operation for the quantum bits in the quantum circuit, a register to hold the measurement results, and a control flow node (jump instruction), and one quantum circuit may contain several tens of hundreds or even thousands of quantum logic gate operations. The execution process of the quantum program is a process of executing all quantum logic gates according to a certain time sequence. Note that the timing is the time sequence in which a single quantum logic gate is executed.
It should be noted that in classical computation, the most basic unit is a bit, and the most basic control mode is a logic gate, and the purpose of the control circuit can be achieved by a combination of logic gates. Similarly, the way in which the qubits are handled is a quantum logic gate. The quantum logic gate is used for enabling the quantum state to evolve, and is the basis for forming a quantum circuit, and comprisesSingle bit quantum logic gates, such as Hadamard gates (H gates, ada Ma Men), brix-X gates (X gates, brix gates), brix-Y gates (Y gates, briy gates), brix-Z gates (Z gates, brix gates), RX gates (RX turnstiles), RY gates (RY turnstiles), RZ gates (RZ turnstiles), and the like; multi-bit quantum logic gates such as CNOT gates, CR gates, iSWAP gates, toffoli gates, and the like. Quantum logic gates are typically represented using unitary matrices, which are not only in matrix form, but also an operation and transformation. The general function of a quantum logic gate on a quantum state is to calculate by multiplying the unitary matrix by a vector corresponding to the right vector of the quantum state. For example, the quantum state right vector |0>The corresponding vector may beQuantum state right vector |1>The corresponding vector may be +.>。
Referring to fig. 2, fig. 2 is a flowchart illustrating a molecular property prediction method according to an exemplary embodiment, and the embodiment of the present application provides a molecular property prediction method, which includes:
s201, obtaining graph data of molecules to be predicted.
The feature vector of each node of the graph data represents one atom of a molecule to be predicted, two nodes with connection relations (edges) in the graph data are adjacent nodes, and an edge between the adjacent nodes represents a chemical bond between the two atoms.
S202, aiming at each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector.
For example, the eigenvector of node i is h i The feature vector of node j is h j If the node i and the node j are adjacent nodes, the feature vector h can be increased by the parameter w i And feature vector h j Is subjected to feature enhancement to obtain an enhanced feature vector wh i And wh j The high-dimensional feature vector can be obtained by splicingwh j ]Wherein double vertical lines represent splice (splice).
It will be appreciated that a high-dimensional feature vector between every two adjacent nodes in the graph data can be obtained in the manner described above.
And S203, mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data.
S204, determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
By adopting the embodiment of the application, the characteristic vector of each node in the graph data and the characteristic vector of the adjacent node are subjected to characteristic enhancement and spliced by acquiring the graph data of the molecules to be predicted, so as to obtain a high-dimensional characteristic vector; mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data; according to the method, a molecular property prediction result of a molecule to be predicted is determined according to the correlation coefficient, a high-dimensional feature vector between adjacent nodes is mapped through a variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, and the complex graph data can be accurately subjected to feature calculation by utilizing the parallel calculation characteristic and entanglement characteristic of the variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, so that the molecular property of the molecule to be predicted can be predicted according to the correlation coefficient, and the accuracy of the prediction result is improved.
In another embodiment of the present application, the variable component sub-line includes a coding layer, a parameter-containing layer, and a measuring layer, as shown in fig. 3, S203, maps each high-dimensional feature vector through the variable component sub-line to obtain a correlation coefficient between adjacent nodes in the graph data, which may be specifically implemented as follows:
s2031, encoding each high-dimensional feature vector to a preset number of qubits using an encoding layer.
In this embodiment, the coding layer includes an H gate and a first RY gate, and S2031 may be specifically implemented as:
and applying the H gate to a preset number of qubits, so that each qubit evolves from an initial state to an overlapped state, and applying the first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded on the preset number of qubits in the overlapped state.
The preset number may be set according to the number of features included in the high-dimensional feature vector, for example, the number of features included in the high-dimensional feature vector obtained through S202 is 4, and the preset number may be set to 4, and each high-dimensional feature vector is encoded to 4 qubits.
Referring to fig. 4, fig. 4 is a schematic diagram of a variable component sub-circuit according to an embodiment of the present application, where the variable component sub-circuit shown in fig. 4 includes q0 to q3 four qubits, and further includes a coding layer, a parameter-containing layer, and a measurement layer, where the coding layer includes an H gate and a first RY gate, and the H gate acts on initial states of q0 to q3 four qubitsOn, it is converted into the superimposed state +.>Then high-dimensional feature vector ++>As a parameter of the first RY door, the form RY (a j ) J=0, 1,2,3, the first RY gate acts on the qubit in the superposition state, and features included in the high-dimensional feature vector are mapped onto the corresponding qubit one by one.
S2032, performing quantum state evolution on the coded quantum bit by using the parameter-containing layering.
Wherein the parameter-containing layer comprises an RX gate, a CNOT gate and a second RY gate.
In this embodiment of the present application, the main function of the CNOT gate is to implement quantum entanglement, so that information between the quantum bits may be exchanged and transferred, as shown in fig. 4, the first quantum bit, the last quantum bit, and two adjacent quantum bits are entangled by using the CNOT gate, in addition, the variable component sub-line further introduces an RX gate and a second RY gate, where the RX gate and the second RY gate include training parameters ψ and Φ, respectively, and by continuously iterating and optimizing the training parameters ψ and Φ, optimization of the variable component sub-line may be implemented, so that the variable component sub-line may accurately learn the correlation between adjacent nodes.
It should be noted that, according to the variable component sub-circuit provided by the embodiment of the application, the sub-layers can be overlapped for multiple times according to a specific quantum coding task, so that the depth is increased, and a better variable component sub-circuit is sought.
S2033, measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating a measurement result expected value to obtain a correlation coefficient.
The measuring layer is the last layer of the variable component sub-circuit, and has the function of decoherence of the quantum bits and conversion of the quantum data into classical data.
Illustratively, as shown in fig. 4, the measurement layer may measure the quantum state of the first quantum bit q0, calculate the expected value, and obtain the correlation coefficient.
According to the embodiment of the application, the high-dimensional feature vectors between the adjacent nodes are encoded onto the preset number of quantum bits through the encoding layer included in the quantum circuit, the quantum state evolution is carried out on the encoded quantum bits through the parameter-containing layering, the quantum state of any quantum bit after the evolution is measured through the measuring layer, the expected value is calculated, the correlation coefficient between the adjacent nodes is obtained, and the processing of complex molecular diagram data is realized by means of the parallel calculation characteristic and the quantum state entanglement characteristic of the variable component sub circuit, so that the accuracy and the speed of molecular property prediction can be improved.
In another embodiment of the present application, the determining the molecular property prediction result of the molecule to be predicted based on the correlation coefficient in S204 may be specifically implemented as:
and step one, carrying out graph attention calculation according to the correlation coefficient, and carrying out graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted.
Specifically, the correlation coefficient can be normalized to obtain an attention coefficient, the feature vector of each node of the graph data is updated through the calculation of the attention coefficient to obtain updated graph data, and the graph pooling operation is performed on the updated graph data to obtain the graph data representation of the molecules to be predicted.
Further, the correlation coefficient may be normalized by the following formula:
wherein,,representing the attention factor, +.>Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>Representing an activation function.
Further, the feature vector of each node of the graph data may be updated by the following formula:
wherein, the double vertical lines represent the splice,representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->Representing the feature vector before the update of the jth node,>representing the activation function, K represents the number of independent attention coefficient matrices.
And secondly, inputting graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result with predicted molecules.
The molecular property prediction method provided by the embodiment of the application can be used for performing classification tasks, such as molecular toxicity prediction, and also can be used for performing regression tasks, such as molecular energy gap prediction. Different loss functions may be selected depending on the task.
In performing classification tasks, the loss function is a cross entropy loss function:
wherein the probability distributionProbability distribution for desired output +.>For the actual output, the smaller the cross entropy loss function value is the probability distribution of the actual output +.>The closer the probability distribution to the desired output.
When performing the regression task, the loss function is a mean square error loss function:
wherein,,for data tag value, +.>And outputting a value for the model. Optimizing model parameters according to the calculated loss function value until the loss function valueWhen the preset threshold is reached, model training is confirmed to be completed, and an Adam gradient updating algorithm with the learning rate of 0.001 can be selected.
Based on the same inventive concept, the embodiments of the present application also provide a molecular property prediction apparatus, as shown in fig. 5, including:
an obtaining module 501, configured to obtain graph data of a molecule to be predicted, where a feature vector of each node of the graph data represents one atom of the molecule to be predicted;
the splicing module 502 performs feature enhancement and splicing on the feature vector of each node and the feature vector of the adjacent node aiming at each node in the graph data to obtain a high-dimensional feature vector;
a mapping module 503, configured to map each high-dimensional feature vector through a variable component sub-line, so as to obtain a correlation coefficient between feature vectors of adjacent nodes in the graph data;
a determining module 504, configured to determine a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
Optionally, the variable component sub-circuit includes a coding layer, a parameter-containing layer, and a measurement layer, and the mapping module 503 is specifically configured to:
encoding each high-dimensional feature vector to a preset number of qubits by using an encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing layering;
and measuring the quantum state of any quantum bit after evolution by using the measuring layer, and calculating the expected value of the measuring result to obtain the correlation coefficient.
Optionally, the coding layer includes an H gate and a first RY gate, and the mapping module 503 is specifically configured to:
and applying the H gate to a preset number of qubits, so that each qubit evolves from an initial state to an overlapped state, and applying the first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded on the preset number of qubits in the overlapped state.
Optionally, the parameter-containing layer includes an RX gate, a CNOT gate, and a second RY gate.
Optionally, the determining module 504 is specifically configured to:
carrying out graph attention calculation according to the correlation coefficient, and carrying out graph pooling operation on the calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result with predicted molecules.
Optionally, the determining module 504 is specifically configured to:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
Optionally, the correlation coefficient is normalized by the following formula:
wherein,,representing the attention factor, +.>Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>Representing an activation function.
Optionally, the feature vector of each node of the graph data is updated by the following formula:
wherein, the double vertical lines represent the splice,representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->Representing the feature vector before the update of the jth node,>representing the activation function, K represents the number of independent attention coefficient matrices.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Still another embodiment of the present application provides a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps in the molecular property prediction method embodiments described above when run.
Specifically, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-only memory (ROM), a random access memory (RandomAccess Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Yet another embodiment of the present application provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of the molecular property prediction method embodiments described above.
Specifically, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Specifically, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
step one, obtaining graph data of molecules to be predicted.
And secondly, aiming at each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector.
And thirdly, mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data.
And step four, determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
The foregoing detailed description of the construction, features and advantages of the present application will be presented in terms of embodiments illustrated in the drawings, wherein the foregoing description is merely illustrative of preferred embodiments of the application, and the scope of the application is not limited to the embodiments illustrated in the drawings.
Claims (11)
1. A method of predicting molecular properties, the method comprising:
obtaining graph data of a molecule to be predicted, wherein a feature vector of each node of the graph data represents one atom of the molecule to be predicted;
for each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector;
mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
2. The method of claim 1, wherein the variable component sub-circuit comprises a coding layer, a parametric layer, and a measurement layer, wherein mapping each high-dimensional feature vector through the variable component sub-circuit results in a correlation coefficient between adjacent nodes in the graph data, comprising:
encoding each high-dimensional feature vector to a preset number of qubits by using the encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing hierarchy;
and measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating the expected value of a measuring result to obtain the correlation coefficient.
3. The method of claim 2, wherein the encoding layer includes an H-gate and a first RY-gate, wherein the encoding each high-dimensional feature vector to a preset number of qubits with the encoding layer comprises:
and applying an H gate to a preset number of qubits, so that each qubit evolves from an initial state to a superposition state, and applying a first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded onto the preset number of qubits in the superposition state.
4. A method according to claim 2 or 3, wherein the parameter-containing layer comprises an RX gate, a CNOT gate and a second RY gate.
5. The method of claim 1, wherein said determining a molecular property prediction of said molecule to be predicted based on said correlation coefficient comprises:
performing graph attention calculation according to the correlation coefficient, and performing graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result of the molecules with the prediction.
6. The method according to claim 5, wherein the performing graph attention calculation according to the correlation coefficient, and performing a graph pooling operation on a calculation result to obtain the graph data representation of the molecule to be predicted, includes:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through the attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
7. The method of claim 6, wherein the correlation coefficient is normalized by the following formula:
8. The method of claim 7, wherein the feature vector for each node of the graph data is updated by the following formula:
wherein, the double vertical lines represent the splice,representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->Representing the feature vector before the update of the jth node,>representing the activation function, K represents the number of independent attention coefficient matrices.
9. A molecular property prediction apparatus, the apparatus comprising:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring graph data of molecules to be predicted, and a feature vector of each node of the graph data represents one atom of the molecules to be predicted;
the splicing module is used for carrying out characteristic enhancement on the characteristic vector of each node and the characteristic vector of the adjacent node aiming at each node in the graph data and splicing the characteristic vector of the node and the characteristic vector of the adjacent node to obtain a high-dimensional characteristic vector;
the mapping module is used for mapping each high-dimensional feature vector through the variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and the determining module is used for determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
10. A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the method of any of claims 1 to 8 when run.
11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310250808.4A CN116306952A (en) | 2023-03-16 | 2023-03-16 | Molecular property prediction method and device, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310250808.4A CN116306952A (en) | 2023-03-16 | 2023-03-16 | Molecular property prediction method and device, storage medium and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116306952A true CN116306952A (en) | 2023-06-23 |
Family
ID=86814560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310250808.4A Pending CN116306952A (en) | 2023-03-16 | 2023-03-16 | Molecular property prediction method and device, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116306952A (en) |
-
2023
- 2023-03-16 CN CN202310250808.4A patent/CN116306952A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115144934B (en) | Meteorological prediction method based on variable component sub-line and related equipment | |
CN114358317B (en) | Data classification method based on machine learning framework and related equipment | |
CN116187548A (en) | Photovoltaic power generation power prediction method and device, storage medium and electronic device | |
CN114819168B (en) | Quantum comparison method and device for matrix eigenvalues | |
CN116011682B (en) | Meteorological data prediction method and device, storage medium and electronic device | |
CN114764619A (en) | Convolution operation method and device based on quantum circuit | |
CN116011681A (en) | Meteorological data prediction method and device, storage medium and electronic device | |
CN116403657A (en) | Drug response prediction method and device, storage medium and electronic device | |
CN116431807A (en) | Text classification method and device, storage medium and electronic device | |
CN114819163B (en) | Training method and device for quantum generation countermeasure network, medium and electronic device | |
CN115809707B (en) | Quantum comparison operation method, device, electronic device and basic arithmetic component | |
CN117709415A (en) | Quantum neural network model optimization method and device | |
CN116306952A (en) | Molecular property prediction method and device, storage medium and electronic device | |
CN117151231A (en) | Method, device and medium for solving linear system by using variable component sub-line | |
CN114862079A (en) | Risk value estimation method, device, medium, and electronic device based on quantum line | |
CN116400430B (en) | Meteorological data prediction method and device, storage medium and electronic device | |
CN116167407B (en) | Quantum circulation neural network-based data prediction method and related equipment | |
CN115730670B (en) | Method and device for generating mode file, medium and electronic device | |
CN114819169B (en) | Quantum estimation method and device for matrix condition number | |
CN115775028B (en) | Quantum circuit optimization method, quantum circuit optimization device, quantum circuit optimization medium and electronic device | |
CN115713122B (en) | Method and device for determining size relation between quantum data and classical data | |
CN115775029B (en) | Quantum circuit conversion method, quantum circuit conversion device, quantum circuit conversion medium and electronic device | |
CN115879559B (en) | Method and device for judging equivalence relation among multiple quantum states and quantum computer | |
CN116541947B (en) | Grover solving method and device for SAT or MAX-SAT problem of vehicle configuration | |
CN115775030B (en) | Quantum program rewriting method and device based on pattern matching and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |