Disclosure of Invention
In view of the above-mentioned drawbacks in the prior art, the present invention provides a protein structure classification system based on quantum convolutional neural network, which includes: a coding module of protein sequence amino acid characteristic data, a quantum convolution and pooling module, a construction loss function module and an update quantum line parameter module,
the coding module of the protein sequence amino acid characteristic data is used for extracting and reading protein sequence information and corresponding structural information from the protein structure classification data set;
a quantum convolution and pooling module for effecting classification of the protein results by parameterized quantum gates;
a build loss function module for obtaining a loss function for characterizing the system performance;
an update quantum wire parameter module for updating the quantum wire parameters.
Wherein the protein structure classification dataset is classified according to 99: the scale of 1 is divided into a training data set and a test data set.
Wherein the quantum convolution and pooling module comprises:
the quantum convolution layer basic unit is used for evolving the quantum state loaded with the protein sequence characteristic information;
a quantum-pooling layer basic unit for mapping information of two qubits onto one qubit.
Wherein the quantum convolution and pooling module is further configured to measure the Polly Z expectation of the last qubit as the final predictor of the protein structure classification by alternating the quantum convolution layer and the quantum pooling layer until only one qubit remains.
Wherein the loss function module is used for constructing protein amino acid sequence characteristic data in each batch b
The input is based on the quantum convolution and pooling module, and each protein amino acid sequence obtains a predicted value through the quantum convolution and pooling module
And then obtaining a loss function for characterizing the system performance by solving the mean square error of the predicted value of all protein amino acid sequences of each batch relative to the real label of the predicted value.
Wherein the loss function is expressed by the following equation:
wherein
K is the number of amino acid sequences of the protein contained in the batch b.
The quantum line parameter updating module is specifically used for solving the analytic gradient of the loss function relative to the quantum line parameters based on the parameterized circuit movement rule, and then updating the quantum line parameters.
Wherein, the calculating the analytic gradient of the loss function with respect to the quantum line parameter based on the parameterized circuit movement rule specifically includes:
hypothetical measurement operator
In parametric quantum wires
The expected value of (A) can be expressed as
Wherein,
representing parameterized quantum wires composed of quantum convolutional layers and pooling layers,
representing parameters in the quantum convolutional layer and the pooling layer;
then the expected value function
With respect to parameterized quantum line parameters
Can be expressed as
Wherein the system trains a plurality of epochs using the training data set until a desired accuracy is reached.
Compared with the prior art, the invention realizes an efficient quantum computer convolution neural network system, can efficiently classify protein structures, and the model used by the system can greatly accelerate the prediction of the protein structures and the development of drugs.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.
It should be understood that while the terms first, second, third, etc. may be used in embodiments of the present invention to describe … …, these … … should not be limited to these terms. These terms are used only to distinguish … …. For example, a first … … may also be referred to as a second … …, and similarly, a second … … may also be referred to as a first … …, without departing from the scope of embodiments of the present invention.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (a stated condition or event)" may be interpreted as "upon determining" or "in response to determining" or "upon detecting (a stated condition or event)" or "in response to detecting (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another like element in a commodity or device comprising the element.
The related terms of the present application:
PDB (Protein Data Bank) Protein database
NISQ (noise intermediate-scale Quantum) noise-containing mesoscale quantum computer
SCOP (Structural Classification of Proteins) protein structure Classification database
PSSM (Position-specific score matrix) site-specific scoring matrix
The quantum convolution neural network loads the characteristic vector representing the protein sequence amino acid into a quantum state based on an amplitude coding mode, and then processes the quantum state containing the protein sequence amino acid characteristic information through a quantum convolution layer and a quantum pooling layer which respectively correspond to the classical convolution and the pooling. In the process, the dimension of the quantum bit is continuously reduced, finally, one quantum bit is measured, the information obtained by measurement and the real label of the structural classification of the protein are combined into a loss function, and the parameters are continuously updated according to the loss function until a satisfactory threshold value is reached.
Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.
The first embodiment,
The invention provides a protein structure classification system based on a quantum convolutional neural network, which comprises the following components: the system comprises a protein sequence amino acid characteristic data coding module, a quantum convolution and pooling module, a loss function construction module and a quantum line parameter updating module.
Wherein, the coding module of the protein sequence amino acid characteristic data is used for extracting protein sequence information and corresponding structure information from a protein structure classification data set (SCOP and the like). The data set was normalized to 99: the scale of 1 is divided into a training data set and a test data set. For amino acid sequence information in the dataset, twenty-dimensional vectors were encoded using the PSSM method. By using
The gyrotron gate acts on qubits (here we use 10 qubits) to load 20-dimensional protein sequence amino acid signature data onto the amplitude of the quantum states.
A quantum convolution and pooling module for building a quantum convolution layer elementary unit and a quantum pooling layer elementary unit, respectively, that can act on two quantum bits, by means of parameterized quantum gates. Then, the quantum convolution layer is formed by the action of the quantum convolution layer basic unit on every two quantum bit pairs of the quantum system, so that the evolution of the quantum state of the previous module loaded with the protein sequence amino acid characteristic information is carried out. Subsequently, a quantum pooling basic unit acts on each quantum bit pair of the quantum system to form a quantum pooling layer, and information of two quantum bits is mapped to one quantum bit, wherein the number of the quantum bits containing high-level information is 5. And then alternately acting the quantum convolution layer and the quantum pooling layer on the remaining 5 qubits, wherein the qubits containing high-level information are 3 qubits, repeating the steps until only one qubit is left by alternately acting the quantum convolution layer and the quantum pooling layer, and finally measuring the Pally Z expected value of the last qubit to be used as a final predicted value of the protein structure classification.
Constructing a loss function module for characterizing the amino acid sequence of the protein in each batch b
Inputting into quantum convolution neural network built based on last module, each proteinThe amino acid sequence of the plasmid can be predicted by the last module
And then obtaining a loss function for characterizing the performance of the model by solving the mean square error of the predicted value of all protein amino acid sequences of each batch relative to the real label of the predicted value.
And the updating quantum line parameter module is used for solving the analytic gradient of the loss function of the last module relative to the quantum line parameters based on the existing parameterized circuit movement rule, then updating the quantum line parameters by using a classical computer, finally training a plurality of epochs by using a protein amino acid sequence training data set, and stopping updating after the expected accuracy is reached.
Example II,
In order to further illustrate the method for predicting the protein structure based on the quantum convolution neural network, the following embodiments are provided:
an encoding module for protein sequence amino acid feature data based on PSSM matrix data characterizing the amino acid feature attributes of each protein, as shown in fig. 1, where each letter represents one of 20 amino acids and each amino acid has a feature vector dimension of 20. By passing
The rotaron gate loads protein amino acid sequence data onto the amplitude of the quantum state (here we use 10 quantum bits) based on the characteristic attribute data of each protein amino acid, with the specific quantum wires as shown in fig. 2. Fig. 2 (a) shows quantum wires encoding 20-dimensional data of single amino acids of a protein sequence onto 10 qubits. FIG. 2 (b) shows the coding implementation of the characteristic data of the entire protein amino acid sequence, here taking the protein amino acid sequence shown in the figure as an example, it can be seen from the figure that firstly the 20-dimensional characteristic data characterizing methionine (M) is coded onto the amplitudes of 10 qubit quantum states in the manner of FIG. 2 (a). Then encoding the characteristic data of threonine (T) to the quantum state, and so on until the wholeUntil the protein sequence is encoded.
A quantum convolution and pooling module comprising a quantum convolution layer elementary unit and a quantum pooling layer elementary unit acting on two quantum bits, the corresponding quantum wires of which are shown in fig. 3, fig. 3 (a) being a quantum convolution layer elementary unit implementation and fig. 3 (b) being a quantum pooling layer elementary unit implementation. Based on the quantum convolution layer basic unit and the quantum convolution pooling layer basic unit, the final predicted value can be obtained by alternately acting the quantum convolution layer and the pooling layer. Specifically, as shown in fig. 4, a block C in the figure represents the quantum convolution layer basic unit in fig. 3 (a), a block P represents the quantum pooling layer basic unit, a block C portion in a dotted line frame represents the first quantum convolution layer, and a block P portion in a dotted line frame represents the first quantum pooling layer. As shown in FIG. 4, we obtain the final predicted value as whether the protein amino acid sequence is an alpha helix structure or not by alternately acting quantum convolution and quantum pooling layers so that the information containing the amino acid characteristics of the protein sequence is finally loaded on one qubit and measuring the Pally Z expected value of the last qubit, wherein the truncated qubit indicates the action of the basic unit without the quantum convolution layer and the pooling layer.
Constructing a loss function module which characterizes the amino acid sequence of the protein in each batch b
Inputting the protein into a quantum convolution neural network built by the previous module, and obtaining a corresponding predicted value for each protein amino acid sequence
. Finally, the predicted values of all protein amino acid sequences in the batch are calculated
The mean square error between the predicted value and the true value of the protein amino acid sequence is calculated by combining the true label whether the predicted value is corresponding to the alpha helical structure or not, thereby obtaining the representation quantum volumeThe loss function of the product neural network model performance and the expression of the mean square error loss function are shown as the following formula.
Wherein
K is the number of amino acid sequences of the protein contained in the batch b.
Update quantum line parameter module: firstly, a measuring operator \ hat M \]In parametric quantum wires
The expected value of (A) can be expressed as
Then the expected value function
With respect to parameterized quantum line parameters
Can be expressed as
In the above formula (3)
Representing parameterized quantum wires composed of quantum convolutional layers and pooling layers,
representing parameters in the quantum convolutional layer and the pooling layer.
The above method is called a parameter-shifting rule for solving the gradient of the parametric quantum wire with respect to the analysis of the desired value of the operator.
Through the parameter moving rule, the analytical gradient of the mean square error loss function of the previous module, namely the formula (1), on the quantum circuit parameters of the quantum convolution layer and the pooling layer can be obtained. The parameters are then updated by a gradient descent method using a classical computer. And finally, training a plurality of epochs according to a training data set of protein amino acid sequence characteristic data until the protein structure classification predicted based on the quantum convolution neural network provided by the patent is accurate to a desired degree.
EXAMPLE III
As shown in fig. 5, the present invention provides a protein structure classification system based on quantum convolutional neural network, which includes: a coding module of protein sequence amino acid characteristic data, a quantum convolution and pooling module, a construction loss function module and an update quantum circuit parameter module,
the coding module of the protein sequence amino acid characteristic data is used for extracting and reading protein sequence information and corresponding structural information from the protein structure classification data set;
a quantum convolution and pooling module for effecting classification of the protein results by parameterized quantum gates;
a build loss function module for obtaining a loss function for characterizing the system performance;
an update quantum wire parameter module for updating the quantum wire parameters.
Wherein the protein structure classification dataset is classified according to 99: the scale of 1 is divided into a training data set and a test data set.
Wherein the quantum convolution and pooling module comprises:
the quantum convolution layer basic unit is used for evolving the quantum state loaded with the protein sequence characteristic information;
a quantum-pooling layer basic unit for mapping information of two qubits onto one qubit.
Wherein the quantum convolution and pooling module is further configured to measure the pauli Z expected value of the last qubit as the final prediction value for the protein structure classification by alternating the quantum convolution layer and the quantum pooling layer until only one qubit remains.
Wherein the loss function module is used for constructing protein amino acid sequence characteristic data in each batch b
The input is based on the quantum convolution and pooling module, and each protein amino acid sequence obtains a predicted value through the quantum convolution and pooling module
And then obtaining a loss function for characterizing the system performance by solving the mean square error of the predicted value of all protein amino acid sequences of each batch relative to the real label of the predicted value.
Wherein the loss function is expressed by the following equation:
wherein k is the number of amino acid sequences of the protein contained in batch b.
The quantum line parameter updating module is specifically used for solving the analytic gradient of the loss function relative to the quantum line parameters based on the parameterized circuit movement rule, and then updating the quantum line parameters.
Wherein, the calculating the analytic gradient of the loss function with respect to the quantum line parameter based on the parameterized circuit movement rule specifically includes:
hypothetical measurement operator
In parametric quantum wires
The expected value of (A) can be expressed as
Wherein,
representing parameterized quantum wires composed of quantum convolutional layers and pooling layers,
representing parameters in the quantum convolutional layer and the pooling layer;
then the expected value function
With respect to parameterized quantum line parameters
Can be expressed as
Wherein the system trains a plurality of epochs using the training data set until a desired accuracy is reached.
Example four,
Embodiments of the present invention provide a non-volatile computer storage medium, where computer-executable instructions are stored, and the computer-executable instructions may perform the method steps described in the above embodiments.
It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local Area Network (AN) or a Wide Area Network (WAN), or the connection may be made to AN external computer (for example, through the internet using AN internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The foregoing describes preferred embodiments of the present invention, and is intended to provide a clear and concise description of the spirit and scope of the invention, and not to limit the same, but to include all modifications, substitutions, and alterations falling within the spirit and scope of the invention as defined by the appended claims.