CN115691669B

CN115691669B - Protein structure classification system based on quantum convolution neural network

Info

Publication number: CN115691669B
Application number: CN202310000900.5A
Authority: CN
Inventors: 胡咏梅; 刘海建; 耿咏忠; 李宁; 杨昱升; 赵立祥; 崔国龙
Original assignee: Beijing Zhongke Arc Quantum Software Technology Co ltd; Sinopharm Bio Pharmaceutical Co Ltd
Current assignee: Beijing Zhongke Arc Quantum Software Technology Co ltd; Sinopharm Bio Pharmaceutical Co Ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-03-17
Anticipated expiration: 2043-01-03
Also published as: CN115691669A

Abstract

The invention discloses a protein structure classification system based on a quantum convolutional neural network, wherein the system comprises: the coding module of protein sequence amino acid characteristic data is used for extracting and reading protein sequence information and corresponding structural information from the protein structure classification data set; a quantum convolution and pooling module for effecting classification of the protein results by parameterized quantum gates; a build loss function module for obtaining a loss function for characterizing the system performance; an update quantum wire parameter module for updating the quantum wire parameters. Compared with the prior art, the invention realizes an efficient quantum computer convolution neural network system, can efficiently classify protein structures, and greatly accelerates the prediction of the protein structures and the development of drugs by using the model used by the system.

Description

Protein structure classification system based on quantum convolution neural network

Technical Field

The invention belongs to the technical field of quantum computers, and particularly relates to a protein structure classification system based on a quantum convolutional neural network.

Background

Proteins are the main players of life activities, and their functions and structures are closely related. The accuracy of protein structure prediction can be greatly improved by effectively classifying the protein structures. At present, the existing classical machine learning algorithm (neural network, support vector machine, random forest and the like) does much work in the direction of protein structure classification. These efforts first lead to the preprocessing of data based on a protein data set, typically having sequence information, secondary structure information, mutation information, etc. The traditional computer can store the protein data information to the classical bit by means of one-hot coding and the like. The data set is then divided into a training data set and a test data set. On the training data set, extracting features through a machine learning algorithm, training to obtain a prediction model, and then testing the accuracy of the test model on the data set.

The prior art uses machine learning models on classical computers to classify protein structures. The classical computer uses classical bits for calculation, which is different from the quantum ratio used by the quantum computer. The coding patterns of proteins in classical computers do not reflect well the intrinsic information of proteins. The amino acid sequence in the protein has time sequence property, and if the protein is coded on a quantum bit by using a quantum computer, the time sequence property of the amino acid sequence can be well embodied by the entanglement property of the quantum bit. In addition, protein databases are large in number and variety, and classical computers have limited capabilities of storing data, calculating data and the like, so that excessively large data sets cannot be processed.

Disclosure of Invention

In view of the above-mentioned drawbacks in the prior art, the present invention provides a protein structure classification system based on quantum convolutional neural network, which includes: a coding module of protein sequence amino acid characteristic data, a quantum convolution and pooling module, a construction loss function module and an update quantum line parameter module,

the coding module of the protein sequence amino acid characteristic data is used for extracting and reading protein sequence information and corresponding structural information from the protein structure classification data set;

a quantum convolution and pooling module for effecting classification of the protein results by parameterized quantum gates;

a build loss function module for obtaining a loss function for characterizing the system performance;

an update quantum wire parameter module for updating the quantum wire parameters.

Wherein the protein structure classification dataset is classified according to 99: the scale of 1 is divided into a training data set and a test data set.

Wherein the quantum convolution and pooling module comprises:

the quantum convolution layer basic unit is used for evolving the quantum state loaded with the protein sequence characteristic information;

a quantum-pooling layer basic unit for mapping information of two qubits onto one qubit.

Wherein the quantum convolution and pooling module is further configured to measure the Polly Z expectation of the last qubit as the final predictor of the protein structure classification by alternating the quantum convolution layer and the quantum pooling layer until only one qubit remains.

Wherein the loss function module is used for constructing protein amino acid sequence characteristic data in each batch b

The input is based on the quantum convolution and pooling module, and each protein amino acid sequence obtains a predicted value through the quantum convolution and pooling module

And then obtaining a loss function for characterizing the system performance by solving the mean square error of the predicted value of all protein amino acid sequences of each batch relative to the real label of the predicted value.

Wherein the loss function is expressed by the following equation:

，

wherein

K is the number of amino acid sequences of the protein contained in the batch b.

The quantum line parameter updating module is specifically used for solving the analytic gradient of the loss function relative to the quantum line parameters based on the parameterized circuit movement rule, and then updating the quantum line parameters.

Wherein, the calculating the analytic gradient of the loss function with respect to the quantum line parameter based on the parameterized circuit movement rule specifically includes:

hypothetical measurement operator

In parametric quantum wires

The expected value of (A) can be expressed as

，

Wherein,

representing parameterized quantum wires composed of quantum convolutional layers and pooling layers,

representing parameters in the quantum convolutional layer and the pooling layer;

then the expected value function

With respect to parameterized quantum line parameters

Can be expressed as

。

Wherein the system trains a plurality of epochs using the training data set until a desired accuracy is reached.

Compared with the prior art, the invention realizes an efficient quantum computer convolution neural network system, can efficiently classify protein structures, and the model used by the system can greatly accelerate the prediction of the protein structures and the development of drugs.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description, which proceeds with reference to the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 is PSSM matrix data showing amino acid signature properties of proteins according to embodiments of the present invention;

fig. 2 (a) is a diagram showing a quantum wire encoding 20-dimensional data of a single amino acid of a protein sequence onto 10 qubits according to an embodiment of the present invention;

FIG. 2 (b) is a block diagram showing an encoded implementation of amino acid sequence profile data of the entire protein according to an embodiment of the present invention;

FIG. 3 (a) is a block diagram showing a basic cell implementation of a quantum convolutional layer according to an embodiment of the present invention;

FIG. 3 (b) is a block diagram showing a basic cell implementation of a quantum pooling layer according to an embodiment of the present invention;

FIG. 4 is a block diagram illustrating a protein structure classification system based on a quantum convolutional neural network according to an embodiment of the present invention;

fig. 5 is a block diagram illustrating a protein structure classification system based on a quantum convolutional neural network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.

It should be understood that while the terms first, second, third, etc. may be used in embodiments of the present invention to describe … …, these … … should not be limited to these terms. These terms are used only to distinguish … …. For example, a first … … may also be referred to as a second … …, and similarly, a second … … may also be referred to as a first … …, without departing from the scope of embodiments of the present invention.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (a stated condition or event)" may be interpreted as "upon determining" or "in response to determining" or "upon detecting (a stated condition or event)" or "in response to detecting (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another like element in a commodity or device comprising the element.

The related terms of the present application:

PDB (Protein Data Bank) Protein database

NISQ (noise intermediate-scale Quantum) noise-containing mesoscale quantum computer

SCOP (Structural Classification of Proteins) protein structure Classification database

PSSM (Position-specific score matrix) site-specific scoring matrix

The quantum convolution neural network loads the characteristic vector representing the protein sequence amino acid into a quantum state based on an amplitude coding mode, and then processes the quantum state containing the protein sequence amino acid characteristic information through a quantum convolution layer and a quantum pooling layer which respectively correspond to the classical convolution and the pooling. In the process, the dimension of the quantum bit is continuously reduced, finally, one quantum bit is measured, the information obtained by measurement and the real label of the structural classification of the protein are combined into a loss function, and the parameters are continuously updated according to the loss function until a satisfactory threshold value is reached.

Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.

The first embodiment,

The invention provides a protein structure classification system based on a quantum convolutional neural network, which comprises the following components: the system comprises a protein sequence amino acid characteristic data coding module, a quantum convolution and pooling module, a loss function construction module and a quantum line parameter updating module.

Wherein, the coding module of the protein sequence amino acid characteristic data is used for extracting protein sequence information and corresponding structure information from a protein structure classification data set (SCOP and the like). The data set was normalized to 99: the scale of 1 is divided into a training data set and a test data set. For amino acid sequence information in the dataset, twenty-dimensional vectors were encoded using the PSSM method. By using

The gyrotron gate acts on qubits (here we use 10 qubits) to load 20-dimensional protein sequence amino acid signature data onto the amplitude of the quantum states.

A quantum convolution and pooling module for building a quantum convolution layer elementary unit and a quantum pooling layer elementary unit, respectively, that can act on two quantum bits, by means of parameterized quantum gates. Then, the quantum convolution layer is formed by the action of the quantum convolution layer basic unit on every two quantum bit pairs of the quantum system, so that the evolution of the quantum state of the previous module loaded with the protein sequence amino acid characteristic information is carried out. Subsequently, a quantum pooling basic unit acts on each quantum bit pair of the quantum system to form a quantum pooling layer, and information of two quantum bits is mapped to one quantum bit, wherein the number of the quantum bits containing high-level information is 5. And then alternately acting the quantum convolution layer and the quantum pooling layer on the remaining 5 qubits, wherein the qubits containing high-level information are 3 qubits, repeating the steps until only one qubit is left by alternately acting the quantum convolution layer and the quantum pooling layer, and finally measuring the Pally Z expected value of the last qubit to be used as a final predicted value of the protein structure classification.

Constructing a loss function module for characterizing the amino acid sequence of the protein in each batch b

Inputting into quantum convolution neural network built based on last module, each proteinThe amino acid sequence of the plasmid can be predicted by the last module

And then obtaining a loss function for characterizing the performance of the model by solving the mean square error of the predicted value of all protein amino acid sequences of each batch relative to the real label of the predicted value.

And the updating quantum line parameter module is used for solving the analytic gradient of the loss function of the last module relative to the quantum line parameters based on the existing parameterized circuit movement rule, then updating the quantum line parameters by using a classical computer, finally training a plurality of epochs by using a protein amino acid sequence training data set, and stopping updating after the expected accuracy is reached.

Example II,

In order to further illustrate the method for predicting the protein structure based on the quantum convolution neural network, the following embodiments are provided:

an encoding module for protein sequence amino acid feature data based on PSSM matrix data characterizing the amino acid feature attributes of each protein, as shown in fig. 1, where each letter represents one of 20 amino acids and each amino acid has a feature vector dimension of 20. By passing

The rotaron gate loads protein amino acid sequence data onto the amplitude of the quantum state (here we use 10 quantum bits) based on the characteristic attribute data of each protein amino acid, with the specific quantum wires as shown in fig. 2. Fig. 2 (a) shows quantum wires encoding 20-dimensional data of single amino acids of a protein sequence onto 10 qubits. FIG. 2 (b) shows the coding implementation of the characteristic data of the entire protein amino acid sequence, here taking the protein amino acid sequence shown in the figure as an example, it can be seen from the figure that firstly the 20-dimensional characteristic data characterizing methionine (M) is coded onto the amplitudes of 10 qubit quantum states in the manner of FIG. 2 (a). Then encoding the characteristic data of threonine (T) to the quantum state, and so on until the wholeUntil the protein sequence is encoded.

A quantum convolution and pooling module comprising a quantum convolution layer elementary unit and a quantum pooling layer elementary unit acting on two quantum bits, the corresponding quantum wires of which are shown in fig. 3, fig. 3 (a) being a quantum convolution layer elementary unit implementation and fig. 3 (b) being a quantum pooling layer elementary unit implementation. Based on the quantum convolution layer basic unit and the quantum convolution pooling layer basic unit, the final predicted value can be obtained by alternately acting the quantum convolution layer and the pooling layer. Specifically, as shown in fig. 4, a block C in the figure represents the quantum convolution layer basic unit in fig. 3 (a), a block P represents the quantum pooling layer basic unit, a block C portion in a dotted line frame represents the first quantum convolution layer, and a block P portion in a dotted line frame represents the first quantum pooling layer. As shown in FIG. 4, we obtain the final predicted value as whether the protein amino acid sequence is an alpha helix structure or not by alternately acting quantum convolution and quantum pooling layers so that the information containing the amino acid characteristics of the protein sequence is finally loaded on one qubit and measuring the Pally Z expected value of the last qubit, wherein the truncated qubit indicates the action of the basic unit without the quantum convolution layer and the pooling layer.

Constructing a loss function module which characterizes the amino acid sequence of the protein in each batch b

Inputting the protein into a quantum convolution neural network built by the previous module, and obtaining a corresponding predicted value for each protein amino acid sequence

. Finally, the predicted values of all protein amino acid sequences in the batch are calculated

The mean square error between the predicted value and the true value of the protein amino acid sequence is calculated by combining the true label whether the predicted value is corresponding to the alpha helical structure or not, thereby obtaining the representation quantum volumeThe loss function of the product neural network model performance and the expression of the mean square error loss function are shown as the following formula.

，

Wherein

Update quantum line parameter module: firstly, a measuring operator \ hat M \]In parametric quantum wires

The expected value of (A) can be expressed as

,(2)

Then the expected value function

With respect to parameterized quantum line parameters

Can be expressed as

,(3)

In the above formula (3)

representing parameters in the quantum convolutional layer and the pooling layer.

The above method is called a parameter-shifting rule for solving the gradient of the parametric quantum wire with respect to the analysis of the desired value of the operator.

Through the parameter moving rule, the analytical gradient of the mean square error loss function of the previous module, namely the formula (1), on the quantum circuit parameters of the quantum convolution layer and the pooling layer can be obtained. The parameters are then updated by a gradient descent method using a classical computer. And finally, training a plurality of epochs according to a training data set of protein amino acid sequence characteristic data until the protein structure classification predicted based on the quantum convolution neural network provided by the patent is accurate to a desired degree.

EXAMPLE III

As shown in fig. 5, the present invention provides a protein structure classification system based on quantum convolutional neural network, which includes: a coding module of protein sequence amino acid characteristic data, a quantum convolution and pooling module, a construction loss function module and an update quantum circuit parameter module,

Wherein the quantum convolution and pooling module comprises:

Wherein the quantum convolution and pooling module is further configured to measure the pauli Z expected value of the last qubit as the final prediction value for the protein structure classification by alternating the quantum convolution layer and the quantum pooling layer until only one qubit remains.

Wherein the loss function is expressed by the following equation:

，

wherein k is the number of amino acid sequences of the protein contained in batch b.

hypothetical measurement operator

In parametric quantum wires

The expected value of (A) can be expressed as

，

Wherein,

then the expected value function

With respect to parameterized quantum line parameters

Can be expressed as

。

Example four,

Embodiments of the present invention provide a non-volatile computer storage medium, where computer-executable instructions are stored, and the computer-executable instructions may perform the method steps described in the above embodiments.

It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local Area Network (AN) or a Wide Area Network (WAN), or the connection may be made to AN external computer (for example, through the internet using AN internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The foregoing describes preferred embodiments of the present invention, and is intended to provide a clear and concise description of the spirit and scope of the invention, and not to limit the same, but to include all modifications, substitutions, and alterations falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A system for protein structure classification based on a quantum convolutional neural network, comprising: a coding module of protein sequence amino acid characteristic data, a quantum convolution and pooling module, a construction loss function module and an update quantum line parameter module,

a coding module for said protein sequence amino acid signature data for reading protein sequence information and corresponding structural information from a protein structure classification dataset;

a quantum convolution and pooling module for effecting classification of the protein structure by a parameterized quantum gate;

an update quantum line parameter module for updating the quantum line parameters;

wherein the quantum convolution and pooling module comprises:

a quantum-pooling layer basic unit for mapping information of two qubits onto one qubit;

the quantum convolution and pooling module is also used to measure the Poyley Z expectation of the last qubit as the final predictor of the protein structure classification by alternating the quantum convolution layer and the quantum pooling layer until only one qubit remains.

2. The quantum convolutional neural network-based protein structure classification system of claim 1, wherein the protein structure classification dataset is classified according to a 99: the scale of 1 is divided into a training data set and a test data set.

3. The system for classifying protein structures based on quantum convolutional neural network as defined in claim 1, wherein said building loss function module is specifically configured to characterize the amino acid sequence of proteins in each batch b

4. The system for classifying protein structures based on quantum convolutional neural network as claimed in claim 3, wherein the loss function is expressed as:

，

wherein

5. The system of claim 1, wherein the update quantum wire parameters module is further configured to apply an analytical gradient of the loss function with respect to the quantum wire parameters based on a parameterized circuit motion law, and then update the quantum wire parameters.

6. The system of claim 1, wherein the step of solving the analytical gradient of the loss function with respect to the quantum wire parameters based on parameterized circuit motion rules comprises:

postulated measurement operator

In parametric quantum wires

The expected value of (A) can be expressed as

，

Wherein,

then the expected value function

The gradient for a parametric quantum wire parameter can be expressed as

。

7. The system for classifying protein structures based on quantum convolutional neural network as claimed in claim 2, wherein said system trains a plurality of epochs using said training data set until a desired accuracy is reached.