CN116306952A - Molecular property prediction method and device, storage medium and electronic device - Google Patents

Molecular property prediction method and device, storage medium and electronic device Download PDF

Info

Publication number
CN116306952A
CN116306952A CN202310250808.4A CN202310250808A CN116306952A CN 116306952 A CN116306952 A CN 116306952A CN 202310250808 A CN202310250808 A CN 202310250808A CN 116306952 A CN116306952 A CN 116306952A
Authority
CN
China
Prior art keywords
feature vector
node
graph data
correlation coefficient
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310250808.4A
Other languages
Chinese (zh)
Inventor
窦猛汉
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Origin Quantum Computing Technology Co Ltd
Original Assignee
Origin Quantum Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Origin Quantum Computing Technology Co Ltd filed Critical Origin Quantum Computing Technology Co Ltd
Priority to CN202310250808.4A priority Critical patent/CN116306952A/en
Publication of CN116306952A publication Critical patent/CN116306952A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N10/00Quantum computing, i.e. information processing based on quantum-mechanical phenomena
    • G06N10/20Models of quantum computing, e.g. quantum circuits or universal quantum computers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions

Abstract

The application discloses a molecular property prediction method, a device, a storage medium and an electronic device, and relates to the technical field of quantum computing, wherein the method comprises the following steps: obtaining graph data of molecules to be predicted, wherein a feature vector of each node of the graph data represents one atom of the molecules to be predicted; for each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector; mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data; and determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient. The accuracy of the molecular property prediction can be improved.

Description

Molecular property prediction method and device, storage medium and electronic device
Technical Field
The application belongs to the technical field of quantum computing, and particularly relates to a molecular property prediction method, a device, a storage medium and an electronic device.
Background
The quantum computer is a kind of physical device which performs high-speed mathematical and logical operation, stores and processes quantum information according to the law of quantum mechanics. When a device processes and calculates quantum information and operates on a quantum algorithm, the device is a quantum computer. Quantum computers are a key technology under investigation because of their ability to handle mathematical problems more efficiently than ordinary computers, for example, to accelerate the time to crack RSA keys from hundreds of years to hours.
Molecular property prediction plays an important role in many fields, such as drug discovery, chemical reaction, catalyst generation, and the like. Particularly in the field of drug discovery, achieving accurate predictions of molecular properties may accelerate the overall process of discovering candidate drugs.
Currently, molecules are usually expressed as Graph data (Graph), and the Graph data are processed based on Graph neural networks (Graph Neural Networks, GNNs), but Graph convolution neural networks (Graph convolutional neural networks, GCN) based on convolution mechanisms have limitations in processing the Graph data, so that complex molecular Graph data processing is difficult, and the accuracy of molecular property prediction results is low.
Disclosure of Invention
The purpose of the application is to provide a molecular property prediction method, a device, a storage medium and an electronic device, which aim to improve the accuracy of a molecular property prediction result.
To achieve the above object, according to a first aspect of embodiments of the present application, there is provided a molecular property prediction method, including:
obtaining graph data of a molecule to be predicted, wherein a feature vector of each node of the graph data represents one atom of the molecule to be predicted;
for each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector;
mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
Optionally, the variable component sub-circuit includes a coding layer, a parameter-containing layer and a measuring layer, and the mapping is performed on each high-dimensional feature vector through the variable component sub-circuit to obtain a correlation coefficient between adjacent nodes in the graph data, including:
encoding each high-dimensional feature vector to a preset number of qubits by using the encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing hierarchy;
and measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating the expected value of a measuring result to obtain the correlation coefficient.
Optionally, the encoding layer includes an H gate and a first RY gate, and the encoding each high-dimensional feature vector to a preset number of qubits using the encoding layer includes:
and applying an H gate to a preset number of qubits, so that each qubit evolves from an initial state to a superposition state, and applying a first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded onto the preset number of qubits in the superposition state.
Optionally, the parameter-containing layer includes an RX gate, a CNOT gate, and a second RY gate.
Optionally, the determining the molecular property prediction result of the molecule to be predicted based on the correlation coefficient includes:
performing graph attention calculation according to the correlation coefficient, and performing graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result of the molecules with the prediction.
Optionally, the performing graph attention calculation according to the correlation coefficient, performing a graph pooling operation on a calculation result to obtain a graph data representation of the molecule to be predicted, including:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through the attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
Optionally, the correlation coefficient is normalized by the following formula:
Figure SMS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_2
representing the attention factor, +.>
Figure SMS_3
Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>
Figure SMS_4
Representing an activation function.
Optionally, the feature vector of each node of the graph data is updated by the following formula:
Figure SMS_5
wherein, the double vertical lines represent the splice,
Figure SMS_6
representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->
Figure SMS_7
Representing the feature vector before the update of the jth node,>
Figure SMS_8
representing the activation function, k representing the individual attention coefficient matricesA number.
In a second aspect of embodiments of the present application, there is provided a molecular property prediction apparatus, the apparatus comprising:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring graph data of molecules to be predicted, and a feature vector of each node of the graph data represents one atom of the molecules to be predicted;
the splicing module is used for carrying out characteristic enhancement on the characteristic vector of each node and the characteristic vector of the adjacent node aiming at each node in the graph data and splicing the characteristic vector of the node and the characteristic vector of the adjacent node to obtain a high-dimensional characteristic vector;
the mapping module is used for mapping each high-dimensional feature vector through the variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and the determining module is used for determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
Optionally, the variable component sub-circuit includes a coding layer, a parameter-containing layer, and a measurement layer, and the mapping module is specifically configured to:
encoding each high-dimensional feature vector to a preset number of qubits by using the encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing hierarchy;
and measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating the expected value of a measuring result to obtain the correlation coefficient.
Optionally, the coding layer includes an H gate and a first RY gate, and the mapping module is specifically configured to:
and applying an H gate to a preset number of qubits, so that each qubit evolves from an initial state to a superposition state, and applying a first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded onto the preset number of qubits in the superposition state.
Optionally, the parameter-containing layer includes an RX gate, a CNOT gate, and a second RY gate.
Optionally, the determining module is specifically configured to:
performing graph attention calculation according to the correlation coefficient, and performing graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result of the molecules with the prediction.
Optionally, the determining module is specifically configured to:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through the attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
Optionally, the correlation coefficient is normalized by the following formula:
Figure SMS_9
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_10
representing the attention factor, +.>
Figure SMS_11
Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>
Figure SMS_12
Representing an activation function.
Optionally, the feature vector of each node of the graph data is updated by the following formula:
Figure SMS_13
wherein, the double vertical lines represent the splice,
Figure SMS_14
representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->
Figure SMS_15
Representing the feature vector before the update of the jth node,>
Figure SMS_16
representing the activation function, K represents the number of independent attention coefficient matrices.
In a third aspect of embodiments of the present application, there is provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of the method of any of the first aspects described above when run.
In a fourth aspect of embodiments of the present application, there is provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of the method according to any of the first aspects above.
By adopting the technical scheme, the characteristic vector of each node in the graph data and the characteristic vector of the adjacent node are subjected to characteristic enhancement and spliced by acquiring the graph data of the molecules to be predicted, so as to obtain a high-dimensional characteristic vector; mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data; according to the method, a molecular property prediction result of a molecule to be predicted is determined according to the correlation coefficient, a high-dimensional feature vector between adjacent nodes is mapped through a variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, and the complex graph data can be accurately subjected to feature calculation by utilizing the parallel calculation characteristic and entanglement characteristic of the variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, so that the molecular property of the molecule to be predicted can be predicted according to the correlation coefficient, and the accuracy of the prediction result is improved.
Drawings
FIG. 1 is a block diagram of the hardware architecture of a computer terminal showing a molecular property prediction method according to an exemplary embodiment;
FIG. 2 is a flow chart illustrating a method of molecular property prediction according to an exemplary embodiment;
FIG. 3 is a flow chart illustrating another molecular property prediction method according to an exemplary embodiment;
FIG. 4 is a schematic diagram of a variable component sub-circuit shown according to an example embodiment;
fig. 5 is a block diagram illustrating a molecular property prediction apparatus according to an exemplary embodiment.
Detailed Description
The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.
The embodiment of the application firstly provides a molecular property prediction method which can be applied to electronic equipment such as computer terminals, in particular to common computers, quantum computers and the like.
The following describes the operation of the computer terminal in detail by taking it as an example. Fig. 1 is a block diagram of a hardware structure of a computer terminal showing a molecular property prediction method according to an exemplary embodiment. As shown in fig. 1, the computer terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing quantum-wire-based molecular property prediction methods, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the computer terminal described above. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the molecular property prediction method in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 104 to perform various functional applications and data processing, i.e., implement the method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 106 includes a network adapter (NetworkInterface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a radio frequency (RadioFrequency, RF) module for communicating wirelessly with the internet.
It should be noted that a real quantum computer is a hybrid structure, which includes two major parts: part of the computers are classical computers and are responsible for performing classical computation and control; the other part is quantum equipment, which is responsible for running quantum programs so as to realize quantum computation. The quantum program is a series of instruction sequences written by a quantum language such as the qlunes language and capable of running on a quantum computer, so that the support of quantum logic gate operation is realized, and finally, quantum computing is realized. Specifically, the quantum program is a series of instruction sequences for operating the quantum logic gate according to a certain time sequence.
In practical applications, quantum computing simulations are often required to verify quantum algorithms, quantum applications, etc., due to the development of quantum device hardware. Quantum computing simulation is a process of realizing simulated operation of a quantum program corresponding to a specific problem by means of a virtual architecture (namely a quantum virtual machine) built by resources of a common computer. In general, it is necessary to construct a quantum program corresponding to a specific problem. The quantum program referred to in the embodiments of the present application is a program written in a classical language to characterize a qubit and its evolution, where the qubit, a quantum logic gate, etc. related to quantum computation are all represented by corresponding classical codes.
Quantum circuits, which are one embodiment of quantum programs and weigh sub-logic circuits as well, are the most commonly used general quantum computing models, representing circuits that operate on qubits under an abstract concept, and their composition includes qubits, circuits (timelines), and various quantum logic gates, and finally the result often needs to be read out through quantum measurement operations.
Unlike conventional circuits, which are connected by metal lines to carry voltage or current signals, in a quantum circuit, the circuit can be seen as being connected by time, i.e., the state of the qubit naturally evolves over time, as indicated by the hamiltonian operator, during which it is operated until a logic gate is encountered.
A quantum program is generally corresponding to a total quantum circuit, where the quantum program refers to the total quantum circuit, and the total number of qubits in the total quantum circuit is the same as the total number of qubits in the quantum program. It can be understood that: one quantum program may consist of a quantum circuit, a measurement operation for the quantum bits in the quantum circuit, a register to hold the measurement results, and a control flow node (jump instruction), and one quantum circuit may contain several tens of hundreds or even thousands of quantum logic gate operations. The execution process of the quantum program is a process of executing all quantum logic gates according to a certain time sequence. Note that the timing is the time sequence in which a single quantum logic gate is executed.
It should be noted that in classical computation, the most basic unit is a bit, and the most basic control mode is a logic gate, and the purpose of the control circuit can be achieved by a combination of logic gates. Similarly, the way in which the qubits are handled is a quantum logic gate. The quantum logic gate is used for enabling the quantum state to evolve, and is the basis for forming a quantum circuit, and comprisesSingle bit quantum logic gates, such as Hadamard gates (H gates, ada Ma Men), brix-X gates (X gates, brix gates), brix-Y gates (Y gates, briy gates), brix-Z gates (Z gates, brix gates), RX gates (RX turnstiles), RY gates (RY turnstiles), RZ gates (RZ turnstiles), and the like; multi-bit quantum logic gates such as CNOT gates, CR gates, iSWAP gates, toffoli gates, and the like. Quantum logic gates are typically represented using unitary matrices, which are not only in matrix form, but also an operation and transformation. The general function of a quantum logic gate on a quantum state is to calculate by multiplying the unitary matrix by a vector corresponding to the right vector of the quantum state. For example, the quantum state right vector |0>The corresponding vector may be
Figure SMS_17
Quantum state right vector |1>The corresponding vector may be +.>
Figure SMS_18
Referring to fig. 2, fig. 2 is a flowchart illustrating a molecular property prediction method according to an exemplary embodiment, and the embodiment of the present application provides a molecular property prediction method, which includes:
s201, obtaining graph data of molecules to be predicted.
The feature vector of each node of the graph data represents one atom of a molecule to be predicted, two nodes with connection relations (edges) in the graph data are adjacent nodes, and an edge between the adjacent nodes represents a chemical bond between the two atoms.
S202, aiming at each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector.
For example, the eigenvector of node i is h i The feature vector of node j is h j If the node i and the node j are adjacent nodes, the feature vector h can be increased by the parameter w i And feature vector h j Is subjected to feature enhancement to obtain an enhanced feature vector wh i And wh j The high-dimensional feature vector can be obtained by splicing
Figure SMS_19
wh j ]Wherein double vertical lines represent splice (splice).
It will be appreciated that a high-dimensional feature vector between every two adjacent nodes in the graph data can be obtained in the manner described above.
And S203, mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data.
S204, determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
By adopting the embodiment of the application, the characteristic vector of each node in the graph data and the characteristic vector of the adjacent node are subjected to characteristic enhancement and spliced by acquiring the graph data of the molecules to be predicted, so as to obtain a high-dimensional characteristic vector; mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data; according to the method, a molecular property prediction result of a molecule to be predicted is determined according to the correlation coefficient, a high-dimensional feature vector between adjacent nodes is mapped through a variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, and the complex graph data can be accurately subjected to feature calculation by utilizing the parallel calculation characteristic and entanglement characteristic of the variable component sub-circuit to obtain the correlation coefficient between the adjacent nodes, so that the molecular property of the molecule to be predicted can be predicted according to the correlation coefficient, and the accuracy of the prediction result is improved.
In another embodiment of the present application, the variable component sub-line includes a coding layer, a parameter-containing layer, and a measuring layer, as shown in fig. 3, S203, maps each high-dimensional feature vector through the variable component sub-line to obtain a correlation coefficient between adjacent nodes in the graph data, which may be specifically implemented as follows:
s2031, encoding each high-dimensional feature vector to a preset number of qubits using an encoding layer.
In this embodiment, the coding layer includes an H gate and a first RY gate, and S2031 may be specifically implemented as:
and applying the H gate to a preset number of qubits, so that each qubit evolves from an initial state to an overlapped state, and applying the first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded on the preset number of qubits in the overlapped state.
The preset number may be set according to the number of features included in the high-dimensional feature vector, for example, the number of features included in the high-dimensional feature vector obtained through S202 is 4, and the preset number may be set to 4, and each high-dimensional feature vector is encoded to 4 qubits.
Referring to fig. 4, fig. 4 is a schematic diagram of a variable component sub-circuit according to an embodiment of the present application, where the variable component sub-circuit shown in fig. 4 includes q0 to q3 four qubits, and further includes a coding layer, a parameter-containing layer, and a measurement layer, where the coding layer includes an H gate and a first RY gate, and the H gate acts on initial states of q0 to q3 four qubits
Figure SMS_20
On, it is converted into the superimposed state +.>
Figure SMS_21
Then high-dimensional feature vector ++>
Figure SMS_22
As a parameter of the first RY door, the form RY (a j ) J=0, 1,2,3, the first RY gate acts on the qubit in the superposition state, and features included in the high-dimensional feature vector are mapped onto the corresponding qubit one by one.
S2032, performing quantum state evolution on the coded quantum bit by using the parameter-containing layering.
Wherein the parameter-containing layer comprises an RX gate, a CNOT gate and a second RY gate.
In this embodiment of the present application, the main function of the CNOT gate is to implement quantum entanglement, so that information between the quantum bits may be exchanged and transferred, as shown in fig. 4, the first quantum bit, the last quantum bit, and two adjacent quantum bits are entangled by using the CNOT gate, in addition, the variable component sub-line further introduces an RX gate and a second RY gate, where the RX gate and the second RY gate include training parameters ψ and Φ, respectively, and by continuously iterating and optimizing the training parameters ψ and Φ, optimization of the variable component sub-line may be implemented, so that the variable component sub-line may accurately learn the correlation between adjacent nodes.
It should be noted that, according to the variable component sub-circuit provided by the embodiment of the application, the sub-layers can be overlapped for multiple times according to a specific quantum coding task, so that the depth is increased, and a better variable component sub-circuit is sought.
S2033, measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating a measurement result expected value to obtain a correlation coefficient.
The measuring layer is the last layer of the variable component sub-circuit, and has the function of decoherence of the quantum bits and conversion of the quantum data into classical data.
Illustratively, as shown in fig. 4, the measurement layer may measure the quantum state of the first quantum bit q0, calculate the expected value, and obtain the correlation coefficient.
According to the embodiment of the application, the high-dimensional feature vectors between the adjacent nodes are encoded onto the preset number of quantum bits through the encoding layer included in the quantum circuit, the quantum state evolution is carried out on the encoded quantum bits through the parameter-containing layering, the quantum state of any quantum bit after the evolution is measured through the measuring layer, the expected value is calculated, the correlation coefficient between the adjacent nodes is obtained, and the processing of complex molecular diagram data is realized by means of the parallel calculation characteristic and the quantum state entanglement characteristic of the variable component sub circuit, so that the accuracy and the speed of molecular property prediction can be improved.
In another embodiment of the present application, the determining the molecular property prediction result of the molecule to be predicted based on the correlation coefficient in S204 may be specifically implemented as:
and step one, carrying out graph attention calculation according to the correlation coefficient, and carrying out graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted.
Specifically, the correlation coefficient can be normalized to obtain an attention coefficient, the feature vector of each node of the graph data is updated through the calculation of the attention coefficient to obtain updated graph data, and the graph pooling operation is performed on the updated graph data to obtain the graph data representation of the molecules to be predicted.
Further, the correlation coefficient may be normalized by the following formula:
Figure SMS_23
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_24
representing the attention factor, +.>
Figure SMS_25
Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>
Figure SMS_26
Representing an activation function.
Further, the feature vector of each node of the graph data may be updated by the following formula:
Figure SMS_27
wherein, the double vertical lines represent the splice,
Figure SMS_28
representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->
Figure SMS_29
Representing the feature vector before the update of the jth node,>
Figure SMS_30
representing the activation function, K represents the number of independent attention coefficient matrices.
And secondly, inputting graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result with predicted molecules.
The molecular property prediction method provided by the embodiment of the application can be used for performing classification tasks, such as molecular toxicity prediction, and also can be used for performing regression tasks, such as molecular energy gap prediction. Different loss functions may be selected depending on the task.
In performing classification tasks, the loss function is a cross entropy loss function:
Figure SMS_31
wherein the probability distribution
Figure SMS_32
Probability distribution for desired output +.>
Figure SMS_33
For the actual output, the smaller the cross entropy loss function value is the probability distribution of the actual output +.>
Figure SMS_34
The closer the probability distribution to the desired output.
When performing the regression task, the loss function is a mean square error loss function:
Figure SMS_35
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_36
for data tag value, +.>
Figure SMS_37
And outputting a value for the model. Optimizing model parameters according to the calculated loss function value until the loss function valueWhen the preset threshold is reached, model training is confirmed to be completed, and an Adam gradient updating algorithm with the learning rate of 0.001 can be selected.
Based on the same inventive concept, the embodiments of the present application also provide a molecular property prediction apparatus, as shown in fig. 5, including:
an obtaining module 501, configured to obtain graph data of a molecule to be predicted, where a feature vector of each node of the graph data represents one atom of the molecule to be predicted;
the splicing module 502 performs feature enhancement and splicing on the feature vector of each node and the feature vector of the adjacent node aiming at each node in the graph data to obtain a high-dimensional feature vector;
a mapping module 503, configured to map each high-dimensional feature vector through a variable component sub-line, so as to obtain a correlation coefficient between feature vectors of adjacent nodes in the graph data;
a determining module 504, configured to determine a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
Optionally, the variable component sub-circuit includes a coding layer, a parameter-containing layer, and a measurement layer, and the mapping module 503 is specifically configured to:
encoding each high-dimensional feature vector to a preset number of qubits by using an encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing layering;
and measuring the quantum state of any quantum bit after evolution by using the measuring layer, and calculating the expected value of the measuring result to obtain the correlation coefficient.
Optionally, the coding layer includes an H gate and a first RY gate, and the mapping module 503 is specifically configured to:
and applying the H gate to a preset number of qubits, so that each qubit evolves from an initial state to an overlapped state, and applying the first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded on the preset number of qubits in the overlapped state.
Optionally, the parameter-containing layer includes an RX gate, a CNOT gate, and a second RY gate.
Optionally, the determining module 504 is specifically configured to:
carrying out graph attention calculation according to the correlation coefficient, and carrying out graph pooling operation on the calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result with predicted molecules.
Optionally, the determining module 504 is specifically configured to:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
Optionally, the correlation coefficient is normalized by the following formula:
Figure SMS_38
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_39
representing the attention factor, +.>
Figure SMS_40
Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>
Figure SMS_41
Representing an activation function.
Optionally, the feature vector of each node of the graph data is updated by the following formula:
Figure SMS_42
wherein, the double vertical lines represent the splice,
Figure SMS_43
representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->
Figure SMS_44
Representing the feature vector before the update of the jth node,>
Figure SMS_45
representing the activation function, K represents the number of independent attention coefficient matrices.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Still another embodiment of the present application provides a storage medium having a computer program stored therein, wherein the computer program is configured to perform the steps in the molecular property prediction method embodiments described above when run.
Specifically, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-only memory (ROM), a random access memory (RandomAccess Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Yet another embodiment of the present application provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of the molecular property prediction method embodiments described above.
Specifically, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Specifically, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
step one, obtaining graph data of molecules to be predicted.
And secondly, aiming at each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector.
And thirdly, mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data.
And step four, determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
The foregoing detailed description of the construction, features and advantages of the present application will be presented in terms of embodiments illustrated in the drawings, wherein the foregoing description is merely illustrative of preferred embodiments of the application, and the scope of the application is not limited to the embodiments illustrated in the drawings.

Claims (11)

1. A method of predicting molecular properties, the method comprising:
obtaining graph data of a molecule to be predicted, wherein a feature vector of each node of the graph data represents one atom of the molecule to be predicted;
for each node in the graph data, carrying out feature enhancement on the feature vector of the node and the feature vector of the adjacent node, and splicing to obtain a high-dimensional feature vector;
mapping each high-dimensional feature vector through a variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
2. The method of claim 1, wherein the variable component sub-circuit comprises a coding layer, a parametric layer, and a measurement layer, wherein mapping each high-dimensional feature vector through the variable component sub-circuit results in a correlation coefficient between adjacent nodes in the graph data, comprising:
encoding each high-dimensional feature vector to a preset number of qubits by using the encoding layer;
carrying out quantum state evolution on the coded quantum bit by utilizing the parameter-containing hierarchy;
and measuring the quantum state of any quantum bit after evolution by using a measuring layer, and calculating the expected value of a measuring result to obtain the correlation coefficient.
3. The method of claim 2, wherein the encoding layer includes an H-gate and a first RY-gate, wherein the encoding each high-dimensional feature vector to a preset number of qubits with the encoding layer comprises:
and applying an H gate to a preset number of qubits, so that each qubit evolves from an initial state to a superposition state, and applying a first RY gate to each qubit according to quantum gate parameters determined by the high-dimensional feature vector, so that the high-dimensional feature vector is encoded onto the preset number of qubits in the superposition state.
4. A method according to claim 2 or 3, wherein the parameter-containing layer comprises an RX gate, a CNOT gate and a second RY gate.
5. The method of claim 1, wherein said determining a molecular property prediction of said molecule to be predicted based on said correlation coefficient comprises:
performing graph attention calculation according to the correlation coefficient, and performing graph pooling operation on a calculation result to obtain graph data representation of the molecules to be predicted;
and (3) inputting the graph data representation of the molecules to be predicted into a fully-connected network to obtain a molecular property prediction result of the molecules with the prediction.
6. The method according to claim 5, wherein the performing graph attention calculation according to the correlation coefficient, and performing a graph pooling operation on a calculation result to obtain the graph data representation of the molecule to be predicted, includes:
normalizing the correlation coefficient to obtain an attention coefficient;
updating the feature vector of each node of the graph data through the attention coefficient calculation to obtain updated graph data;
and carrying out image pooling operation on the updated image data to obtain image data representation of the molecules to be predicted.
7. The method of claim 6, wherein the correlation coefficient is normalized by the following formula:
Figure QLYQS_1
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure QLYQS_2
representing the attention factor, +.>
Figure QLYQS_3
Representing the correlation coefficient of the feature vector between two adjacent nodes in the graph data, k representing the summation coefficient, N representing the number of adjacent nodes of the ith node, +.>
Figure QLYQS_4
Representing an activation function.
8. The method of claim 7, wherein the feature vector for each node of the graph data is updated by the following formula:
Figure QLYQS_5
wherein, the double vertical lines represent the splice,
Figure QLYQS_6
representing the feature vector updated by the ith node, W representing the weight matrix of the linear transformation, < ->
Figure QLYQS_7
Representing the feature vector before the update of the jth node,>
Figure QLYQS_8
representing the activation function, K represents the number of independent attention coefficient matrices.
9. A molecular property prediction apparatus, the apparatus comprising:
the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring graph data of molecules to be predicted, and a feature vector of each node of the graph data represents one atom of the molecules to be predicted;
the splicing module is used for carrying out characteristic enhancement on the characteristic vector of each node and the characteristic vector of the adjacent node aiming at each node in the graph data and splicing the characteristic vector of the node and the characteristic vector of the adjacent node to obtain a high-dimensional characteristic vector;
the mapping module is used for mapping each high-dimensional feature vector through the variable component sub-line to obtain a correlation coefficient between adjacent node feature vectors in the graph data;
and the determining module is used for determining a molecular property prediction result of the molecule to be predicted based on the correlation coefficient.
10. A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the method of any of claims 1 to 8 when run.
11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 8.
CN202310250808.4A 2023-03-16 2023-03-16 Molecular property prediction method and device, storage medium and electronic device Pending CN116306952A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310250808.4A CN116306952A (en) 2023-03-16 2023-03-16 Molecular property prediction method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310250808.4A CN116306952A (en) 2023-03-16 2023-03-16 Molecular property prediction method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN116306952A true CN116306952A (en) 2023-06-23

Family

ID=86814560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310250808.4A Pending CN116306952A (en) 2023-03-16 2023-03-16 Molecular property prediction method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN116306952A (en)

Similar Documents

Publication Publication Date Title
CN115144934B (en) Meteorological prediction method based on variable component sub-line and related equipment
CN113222155B (en) Quantum circuit construction method and device, electronic device and storage medium
CN116187548A (en) Photovoltaic power generation power prediction method and device, storage medium and electronic device
CN116403657A (en) Drug response prediction method and device, storage medium and electronic device
CN116011682A (en) Meteorological data prediction method and device, storage medium and electronic device
CN114358317B (en) Data classification method based on machine learning framework and related equipment
CN116306952A (en) Molecular property prediction method and device, storage medium and electronic device
CN114862079A (en) Risk value estimation method, device, medium, and electronic device based on quantum line
CN116011681A (en) Meteorological data prediction method and device, storage medium and electronic device
CN114819168B (en) Quantum comparison method and device for matrix eigenvalues
CN114819169B (en) Quantum estimation method and device for matrix condition number
CN116400430A (en) Meteorological data prediction method and device, storage medium and electronic device
CN114764620B (en) Quantum convolution operator
CN114764619B (en) Convolution operation method and device based on quantum circuit
CN114819163B (en) Training method and device for quantum generation countermeasure network, medium and electronic device
CN114820182B (en) Quantum searching method and device for coordination pairs in financial transaction data
CN117035105A (en) Data prediction method and related equipment
CN117852665A (en) Quantum circuit-based physical system state prediction method and related device
CN116167407A (en) Quantum circulation neural network-based data prediction method and related equipment
CN116431807A (en) Text classification method and device, storage medium and electronic device
CN116499466A (en) Intelligent navigation method and device, storage medium and electronic device
CN115131120A (en) Quantum option estimation method based on least square method and related device
CN117877611A (en) Method and device for predicting molecular properties
CN116263883A (en) Quantum linear solving method, device and equipment based on polynomial preprocessor
CN117875370A (en) Task processing method and device utilizing molecular data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination